2023-11-25 20:20:52,064 INFO [train_asr.py:1303] (1/4) Training started 2023-11-25 20:20:52,064 INFO [train_asr.py:1313] (1/4) Device: cuda:1 2023-11-25 20:20:52,078 INFO [train_asr.py:1325] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1125112954-6d844cbdd8-m6xmg', 'IP address': '10.177.94.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-25 20:20:52,079 INFO [train_asr.py:1334] (1/4) About to create model 2023-11-25 20:20:52,776 INFO [train_asr.py:1338] (1/4) Number of model parameters: 65819362 2023-11-25 20:20:52,776 INFO [train_asr.py:1362] (1/4) Using CED labels! 2023-11-25 20:20:52,776 INFO [checkpoint.py:112] (1/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-25 20:20:56,541 INFO [train_asr.py:1370] (1/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-25 20:21:00,072 INFO [train_asr.py:1379] (1/4) Using DDP 2023-11-25 20:21:00,337 INFO [train_asr.py:1402] (1/4) Loading optimizer state dict 2023-11-25 20:21:00,825 INFO [train_asr.py:1410] (1/4) Loading scheduler state dict 2023-11-25 20:21:00,877 INFO [train_asr.py:1432] (1/4) Getting audioset cuts 2023-11-25 20:21:00,877 INFO [kd_datamodule.py:784] (1/4) About to get the audioset cuts. 2023-11-25 20:21:00,964 INFO [train_asr.py:1438] (1/4) Using mux to combine Librispeech with audioset 2023-11-25 20:21:00,964 INFO [train_asr.py:1449] (1/4) CutSet(len=2748469) [underlying data type: ] 2023-11-25 20:21:10,005 INFO [kd_datamodule.py:396] (1/4) Enable MUSAN 2023-11-25 20:21:10,006 INFO [kd_datamodule.py:397] (1/4) About to get Musan cuts 2023-11-25 20:21:12,639 INFO [kd_datamodule.py:427] (1/4) Enable SpecAugment 2023-11-25 20:21:12,639 INFO [kd_datamodule.py:428] (1/4) Time warp factor: 80 2023-11-25 20:21:12,640 INFO [kd_datamodule.py:438] (1/4) Num frame mask: 10 2023-11-25 20:21:12,640 INFO [kd_datamodule.py:451] (1/4) About to create train dataset 2023-11-25 20:21:12,641 INFO [kd_datamodule.py:487] (1/4) Using SimpleCutSampler 2023-11-25 20:21:12,642 INFO [kd_datamodule.py:495] (1/4) About to create train dataloader 2023-11-25 20:21:12,646 INFO [kd_datamodule.py:802] (1/4) About to get the audioset eval cuts. 2023-11-25 20:21:12,647 INFO [train_asr.py:1513] (1/4) CutSet(len=20681) [underlying data type: ] 2023-11-25 20:21:12,703 INFO [kd_datamodule.py:529] (1/4) About to create dev dataset 2023-11-25 20:21:13,142 INFO [kd_datamodule.py:550] (1/4) About to create dev dataloader 2023-11-25 20:21:13,143 INFO [train_asr.py:1527] (1/4) Loading grad scaler state dict 2023-11-25 20:21:48,365 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 0, loss[loss=0.1311, simple_loss=0.1088, pruned_loss=0.01357, audio_tagging_loss=0.06306, over 15635.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1088, pruned_loss=0.01357, audio_tagging_loss=0.06306, over 15635.00 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:21:48,365 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-25 20:22:06,062 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8250, 4.9660, 5.0957, 4.8649], device='cuda:1') 2023-11-25 20:22:08,430 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3039, 4.3179, 4.5084, 4.4468], device='cuda:1') 2023-11-25 20:22:20,736 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.127, simple_loss=0.05083, pruned_loss=0.005243, audio_tagging_loss=0.09629, over 4681554.00 frames. 2023-11-25 20:22:20,737 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-25 20:22:25,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3046020.0, ans=0.125 2023-11-25 20:22:29,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3046020.0, ans=15.0 2023-11-25 20:22:35,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-25 20:22:41,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3046086.6666666665, ans=0.2 2023-11-25 20:23:10,930 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-25 20:23:16,259 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 50, loss[loss=0.07515, simple_loss=0.0696, pruned_loss=0.008694, audio_tagging_loss=0.03166, over 15517.00 frames. ], tot_loss[loss=0.09872, simple_loss=0.08579, pruned_loss=0.01241, audio_tagging_loss=0.04341, over 682278.77 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:23:28,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3046420.0, ans=0.1 2023-11-25 20:23:39,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.865e+01 9.525e+01 1.035e+02 1.246e+02 6.272e+02, threshold=2.069e+02, percent-clipped=17.0 2023-11-25 20:23:56,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3046553.3333333335, ans=0.0 2023-11-25 20:24:06,631 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-25 20:24:12,192 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 100, loss[loss=0.07834, simple_loss=0.0652, pruned_loss=0.0116, audio_tagging_loss=0.03415, over 15076.00 frames. ], tot_loss[loss=0.09299, simple_loss=0.08841, pruned_loss=0.01257, audio_tagging_loss=0.03621, over 1208240.38 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:24:15,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3046686.6666666665, ans=0.0 2023-11-25 20:24:35,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046820.0, ans=0.1 2023-11-25 20:24:47,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3046886.6666666665, ans=0.0 2023-11-25 20:25:00,690 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-25 20:25:02,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3046953.3333333335, ans=0.0 2023-11-25 20:25:05,939 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 150, loss[loss=0.08866, simple_loss=0.1113, pruned_loss=0.01632, audio_tagging_loss=0.01669, over 15227.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.08984, pruned_loss=0.01281, audio_tagging_loss=0.0296, over 1618943.79 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:25:15,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3047086.6666666665, ans=0.125 2023-11-25 20:25:28,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.752e+01 9.435e+01 1.031e+02 1.991e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-25 20:25:55,168 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-25 20:26:00,388 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 200, loss[loss=0.07684, simple_loss=0.1045, pruned_loss=0.01482, audio_tagging_loss=0.009792, over 13762.00 frames. ], tot_loss[loss=0.08367, simple_loss=0.09291, pruned_loss=0.01319, audio_tagging_loss=0.02402, over 1937467.52 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:26:08,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3047353.3333333335, ans=0.125 2023-11-25 20:26:37,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3047553.3333333335, ans=0.125 2023-11-25 20:26:49,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-25 20:26:50,188 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-25 20:26:53,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3047620.0, ans=0.125 2023-11-25 20:26:53,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3047620.0, ans=0.1 2023-11-25 20:26:55,835 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 250, loss[loss=0.08018, simple_loss=0.107, pruned_loss=0.01522, audio_tagging_loss=0.01144, over 15077.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.09344, pruned_loss=0.01336, audio_tagging_loss=0.02033, over 2183155.01 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:27:16,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.232e+01 9.778e+01 1.082e+02 1.251e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-25 20:27:18,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-11-25 20:27:21,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3047820.0, ans=0.025 2023-11-25 20:27:28,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3047886.6666666665, ans=0.07 2023-11-25 20:27:44,423 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-25 20:27:50,103 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 300, loss[loss=0.07488, simple_loss=0.0997, pruned_loss=0.01518, audio_tagging_loss=0.009852, over 15846.00 frames. ], tot_loss[loss=0.07742, simple_loss=0.09278, pruned_loss=0.01322, audio_tagging_loss=0.01782, over 2375918.68 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:27:55,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-25 20:27:59,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-25 20:27:59,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048086.6666666665, ans=0.125 2023-11-25 20:28:00,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048086.6666666665, ans=0.1 2023-11-25 20:28:01,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048086.6666666665, ans=0.1 2023-11-25 20:28:03,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2023-11-25 20:28:17,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3048153.3333333335, ans=0.125 2023-11-25 20:28:20,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3048153.3333333335, ans=0.125 2023-11-25 20:28:21,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3048153.3333333335, ans=0.0 2023-11-25 20:28:22,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048220.0, ans=0.1 2023-11-25 20:28:28,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3048220.0, ans=0.025 2023-11-25 20:28:32,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3048286.6666666665, ans=0.0 2023-11-25 20:28:38,389 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-25 20:28:43,508 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 350, loss[loss=0.08134, simple_loss=0.1086, pruned_loss=0.01782, audio_tagging_loss=0.009212, over 15340.00 frames. ], tot_loss[loss=0.07591, simple_loss=0.09356, pruned_loss=0.01327, audio_tagging_loss=0.01586, over 2523160.65 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:28:44,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3048353.3333333335, ans=0.125 2023-11-25 20:29:04,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3048486.6666666665, ans=0.2 2023-11-25 20:29:05,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3048486.6666666665, ans=0.125 2023-11-25 20:29:05,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.941e+01 9.403e+01 1.014e+02 1.528e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-25 20:29:12,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3048486.6666666665, ans=0.125 2023-11-25 20:29:23,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3048553.3333333335, ans=0.05 2023-11-25 20:29:25,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3048620.0, ans=0.125 2023-11-25 20:29:31,908 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-25 20:29:38,110 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 400, loss[loss=0.05395, simple_loss=0.06912, pruned_loss=0.008437, audio_tagging_loss=0.01095, over 15973.00 frames. ], tot_loss[loss=0.07383, simple_loss=0.09234, pruned_loss=0.01302, audio_tagging_loss=0.01464, over 2641171.11 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:29:43,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3048686.6666666665, ans=0.125 2023-11-25 20:29:53,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3048753.3333333335, ans=0.0 2023-11-25 20:29:59,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3048820.0, ans=0.1 2023-11-25 20:30:12,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3048886.6666666665, ans=0.0 2023-11-25 20:30:25,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3048953.3333333335, ans=0.2 2023-11-25 20:30:26,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-25 20:30:32,074 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 450, loss[loss=0.07181, simple_loss=0.09453, pruned_loss=0.0116, audio_tagging_loss=0.01295, over 14756.00 frames. ], tot_loss[loss=0.07272, simple_loss=0.09209, pruned_loss=0.01298, audio_tagging_loss=0.0137, over 2729127.75 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:30:46,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3049086.6666666665, ans=0.125 2023-11-25 20:30:47,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3049086.6666666665, ans=0.0 2023-11-25 20:30:52,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.642e+01 9.445e+01 1.011e+02 1.527e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 20:31:20,671 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-25 20:31:26,190 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 500, loss[loss=0.06973, simple_loss=0.09767, pruned_loss=0.01214, audio_tagging_loss=0.008755, over 16455.00 frames. ], tot_loss[loss=0.07194, simple_loss=0.09217, pruned_loss=0.01301, audio_tagging_loss=0.01285, over 2804365.24 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:31:37,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3049420.0, ans=0.0 2023-11-25 20:31:37,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3049420.0, ans=0.125 2023-11-25 20:31:46,214 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:32:07,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3049553.3333333335, ans=0.2 2023-11-25 20:32:14,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-25 20:32:15,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3049620.0, ans=0.0 2023-11-25 20:32:20,582 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 550, loss[loss=0.0604, simple_loss=0.08193, pruned_loss=0.0108, audio_tagging_loss=0.008628, over 14690.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.09033, pruned_loss=0.01257, audio_tagging_loss=0.0124, over 2856755.40 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:32:24,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3049686.6666666665, ans=0.0 2023-11-25 20:32:42,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.908e+01 9.696e+01 1.036e+02 1.301e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-25 20:32:44,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3049820.0, ans=0.2 2023-11-25 20:32:53,382 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:33:08,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-25 20:33:10,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3049953.3333333335, ans=0.2 2023-11-25 20:33:13,977 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 600, loss[loss=0.04962, simple_loss=0.06025, pruned_loss=0.006951, audio_tagging_loss=0.01254, over 16490.00 frames. ], tot_loss[loss=0.07029, simple_loss=0.09139, pruned_loss=0.01277, audio_tagging_loss=0.01182, over 2905622.83 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:33:25,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3050086.6666666665, ans=0.125 2023-11-25 20:34:02,981 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-25 20:34:08,175 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 650, loss[loss=0.05201, simple_loss=0.06828, pruned_loss=0.00732, audio_tagging_loss=0.01056, over 14238.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09141, pruned_loss=0.01268, audio_tagging_loss=0.01146, over 2941608.53 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:34:19,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.18 vs. limit=6.0 2023-11-25 20:34:22,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3050420.0, ans=0.1 2023-11-25 20:34:26,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-25 20:34:29,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3050486.6666666665, ans=0.125 2023-11-25 20:34:31,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.926e+01 9.451e+01 1.004e+02 1.811e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:34:40,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3050553.3333333335, ans=0.125 2023-11-25 20:34:43,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3050553.3333333335, ans=0.1 2023-11-25 20:34:47,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3050553.3333333335, ans=0.0 2023-11-25 20:34:48,482 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:34:56,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-25 20:35:02,819 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 700, loss[loss=0.0849, simple_loss=0.1223, pruned_loss=0.01799, audio_tagging_loss=0.005759, over 14776.00 frames. ], tot_loss[loss=0.06916, simple_loss=0.09063, pruned_loss=0.01264, audio_tagging_loss=0.01121, over 2964989.71 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:35:05,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3050686.6666666665, ans=0.2 2023-11-25 20:35:19,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3050753.3333333335, ans=0.2 2023-11-25 20:35:20,080 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:35:20,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2023-11-25 20:35:26,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3050820.0, ans=0.125 2023-11-25 20:35:33,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=12.0 2023-11-25 20:35:46,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3050953.3333333335, ans=0.0 2023-11-25 20:35:52,107 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-25 20:35:57,277 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 750, loss[loss=0.07231, simple_loss=0.09458, pruned_loss=0.01292, audio_tagging_loss=0.0121, over 17139.00 frames. ], tot_loss[loss=0.06948, simple_loss=0.09129, pruned_loss=0.01285, audio_tagging_loss=0.01098, over 2990184.23 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:36:05,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3051020.0, ans=0.0 2023-11-25 20:36:07,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3051086.6666666665, ans=0.125 2023-11-25 20:36:07,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2023-11-25 20:36:19,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.878e+01 9.438e+01 1.006e+02 1.228e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:36:30,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-25 20:36:34,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.24 vs. limit=10.0 2023-11-25 20:36:41,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3051286.6666666665, ans=0.2 2023-11-25 20:36:45,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-25 20:36:45,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3051286.6666666665, ans=0.125 2023-11-25 20:36:51,356 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 800, loss[loss=0.07017, simple_loss=0.09747, pruned_loss=0.01163, audio_tagging_loss=0.0098, over 16163.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09041, pruned_loss=0.01288, audio_tagging_loss=0.011, over 2999033.89 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:36:54,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3051353.3333333335, ans=0.125 2023-11-25 20:37:01,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051420.0, ans=0.1 2023-11-25 20:37:06,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3051420.0, ans=0.125 2023-11-25 20:37:14,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3051486.6666666665, ans=0.07 2023-11-25 20:37:18,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051486.6666666665, ans=0.1 2023-11-25 20:37:19,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3051486.6666666665, ans=0.1 2023-11-25 20:37:36,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-25 20:37:40,153 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-25 20:37:41,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051620.0, ans=0.1 2023-11-25 20:37:44,406 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:37:45,297 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 850, loss[loss=0.04222, simple_loss=0.05519, pruned_loss=0.005128, audio_tagging_loss=0.0095, over 14998.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.08983, pruned_loss=0.0128, audio_tagging_loss=0.01092, over 3005260.08 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:37:47,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=12.0 2023-11-25 20:38:08,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.812e+01 9.277e+01 9.996e+01 1.418e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 20:38:11,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3051820.0, ans=0.125 2023-11-25 20:38:22,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3051886.6666666665, ans=0.0 2023-11-25 20:38:33,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-25 20:38:35,666 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-25 20:38:40,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3052020.0, ans=0.125 2023-11-25 20:38:41,242 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 900, loss[loss=0.08025, simple_loss=0.1055, pruned_loss=0.01756, audio_tagging_loss=0.009943, over 15728.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09052, pruned_loss=0.01292, audio_tagging_loss=0.01077, over 3013398.59 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:38:55,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2023-11-25 20:39:02,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3052153.3333333335, ans=0.125 2023-11-25 20:39:09,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3052153.3333333335, ans=0.125 2023-11-25 20:39:17,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3052220.0, ans=0.125 2023-11-25 20:39:18,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3052220.0, ans=0.125 2023-11-25 20:39:29,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-25 20:39:35,114 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 950, loss[loss=0.07555, simple_loss=0.1015, pruned_loss=0.01619, audio_tagging_loss=0.008616, over 14388.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09109, pruned_loss=0.01302, audio_tagging_loss=0.01055, over 3032590.93 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:39:40,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3052353.3333333335, ans=0.015 2023-11-25 20:39:43,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3052353.3333333335, ans=0.125 2023-11-25 20:39:46,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3052420.0, ans=0.125 2023-11-25 20:39:58,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.948e+01 9.425e+01 1.001e+02 1.201e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-25 20:40:01,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3052486.6666666665, ans=0.1 2023-11-25 20:40:06,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3052486.6666666665, ans=0.125 2023-11-25 20:40:18,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3052620.0, ans=0.125 2023-11-25 20:40:21,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-25 20:40:23,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-25 20:40:23,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3052620.0, ans=0.0 2023-11-25 20:40:29,243 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1000, loss[loss=0.07482, simple_loss=0.09568, pruned_loss=0.01514, audio_tagging_loss=0.01185, over 15218.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09077, pruned_loss=0.01293, audio_tagging_loss=0.01014, over 3034072.49 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:40:35,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.14 vs. limit=22.5 2023-11-25 20:40:52,824 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:41:11,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3052886.6666666665, ans=0.04949747468305833 2023-11-25 20:41:19,401 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-25 20:41:19,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3052953.3333333335, ans=0.125 2023-11-25 20:41:19,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-11-25 20:41:24,733 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1050, loss[loss=0.07818, simple_loss=0.1047, pruned_loss=0.01682, audio_tagging_loss=0.00899, over 15262.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09002, pruned_loss=0.01279, audio_tagging_loss=0.01005, over 3037726.34 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:41:28,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3053020.0, ans=0.09899494936611666 2023-11-25 20:41:46,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-25 20:41:46,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3053153.3333333335, ans=0.0 2023-11-25 20:41:48,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.789e+01 9.440e+01 1.020e+02 1.231e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:41:53,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3053153.3333333335, ans=0.0 2023-11-25 20:41:55,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3053220.0, ans=0.125 2023-11-25 20:42:05,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3053220.0, ans=0.125 2023-11-25 20:42:07,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3053286.6666666665, ans=0.0 2023-11-25 20:42:12,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-25 20:42:13,610 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-25 20:42:16,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-25 20:42:16,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3053286.6666666665, ans=0.1 2023-11-25 20:42:19,167 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1100, loss[loss=0.06999, simple_loss=0.09563, pruned_loss=0.01516, audio_tagging_loss=0.00701, over 15120.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.08986, pruned_loss=0.01277, audio_tagging_loss=0.009925, over 3040049.83 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:42:20,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3053353.3333333335, ans=0.125 2023-11-25 20:42:21,250 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:42:34,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3053420.0, ans=0.125 2023-11-25 20:42:37,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3053420.0, ans=0.0 2023-11-25 20:42:46,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3053486.6666666665, ans=0.1 2023-11-25 20:43:03,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3053620.0, ans=0.0 2023-11-25 20:43:07,627 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-25 20:43:08,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2023-11-25 20:43:10,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=12.0 2023-11-25 20:43:12,716 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1150, loss[loss=0.04983, simple_loss=0.06084, pruned_loss=0.009595, audio_tagging_loss=0.009816, over 14627.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09035, pruned_loss=0.01276, audio_tagging_loss=0.009842, over 3042835.95 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:43:15,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3053686.6666666665, ans=0.1 2023-11-25 20:43:17,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2023-11-25 20:43:20,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3053686.6666666665, ans=0.125 2023-11-25 20:43:38,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.733e+01 9.355e+01 1.009e+02 1.328e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 20:43:47,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3053886.6666666665, ans=0.07 2023-11-25 20:43:48,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2023-11-25 20:43:56,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3053953.3333333335, ans=0.0 2023-11-25 20:44:02,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-25 20:44:08,915 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1200, loss[loss=0.08916, simple_loss=0.1173, pruned_loss=0.02171, audio_tagging_loss=0.008806, over 14682.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09053, pruned_loss=0.01275, audio_tagging_loss=0.009815, over 3039008.87 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:44:37,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3054153.3333333335, ans=0.2 2023-11-25 20:44:57,727 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-25 20:44:57,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3054286.6666666665, ans=0.125 2023-11-25 20:45:02,907 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1250, loss[loss=0.08024, simple_loss=0.1056, pruned_loss=0.01561, audio_tagging_loss=0.01185, over 13868.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.091, pruned_loss=0.01285, audio_tagging_loss=0.009856, over 3040269.99 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:45:08,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3054353.3333333335, ans=0.125 2023-11-25 20:45:22,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3054486.6666666665, ans=0.1 2023-11-25 20:45:25,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3054486.6666666665, ans=0.125 2023-11-25 20:45:27,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.172e+01 9.748e+01 1.035e+02 1.279e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-25 20:45:51,547 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-25 20:45:56,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.93 vs. limit=10.0 2023-11-25 20:45:57,094 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1300, loss[loss=0.05907, simple_loss=0.07371, pruned_loss=0.01116, audio_tagging_loss=0.01105, over 14456.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.0907, pruned_loss=0.013, audio_tagging_loss=0.009754, over 3040899.54 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:46:04,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3054686.6666666665, ans=0.025 2023-11-25 20:46:08,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3054753.3333333335, ans=0.125 2023-11-25 20:46:08,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3054753.3333333335, ans=0.2 2023-11-25 20:46:09,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3054753.3333333335, ans=0.0 2023-11-25 20:46:17,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054753.3333333335, ans=0.1 2023-11-25 20:46:19,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3054820.0, ans=0.125 2023-11-25 20:46:31,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-25 20:46:34,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-25 20:46:35,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3054886.6666666665, ans=0.07 2023-11-25 20:46:38,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-25 20:46:45,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-25 20:46:52,084 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1350, loss[loss=0.0787, simple_loss=0.1023, pruned_loss=0.01775, audio_tagging_loss=0.00979, over 16486.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09061, pruned_loss=0.0129, audio_tagging_loss=0.009648, over 3043243.38 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:46:53,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3055020.0, ans=0.0 2023-11-25 20:47:01,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3055020.0, ans=0.2 2023-11-25 20:47:16,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.731e+01 9.396e+01 1.007e+02 1.248e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 20:47:17,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3055153.3333333335, ans=0.2 2023-11-25 20:47:27,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3055220.0, ans=0.0 2023-11-25 20:47:31,804 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:47:41,149 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-25 20:47:46,332 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1400, loss[loss=0.0622, simple_loss=0.08286, pruned_loss=0.01049, audio_tagging_loss=0.01027, over 15477.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.0907, pruned_loss=0.0129, audio_tagging_loss=0.009648, over 3045140.05 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:47:52,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-25 20:47:52,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-25 20:48:07,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3055486.6666666665, ans=0.1 2023-11-25 20:48:11,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-11-25 20:48:25,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3055553.3333333335, ans=0.125 2023-11-25 20:48:35,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-25 20:48:40,186 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1450, loss[loss=0.08538, simple_loss=0.1158, pruned_loss=0.01753, audio_tagging_loss=0.009974, over 14605.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09052, pruned_loss=0.01288, audio_tagging_loss=0.009713, over 3042363.62 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:48:50,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-25 20:49:05,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.671e+01 9.348e+01 1.019e+02 1.564e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 20:49:15,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2023-11-25 20:49:18,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3055886.6666666665, ans=0.5 2023-11-25 20:49:28,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-25 20:49:34,814 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1500, loss[loss=0.05829, simple_loss=0.07674, pruned_loss=0.009858, audio_tagging_loss=0.01006, over 15216.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09054, pruned_loss=0.0129, audio_tagging_loss=0.009734, over 3044554.02 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:49:59,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3056153.3333333335, ans=0.125 2023-11-25 20:50:04,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3056153.3333333335, ans=0.125 2023-11-25 20:50:07,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3056220.0, ans=0.0 2023-11-25 20:50:25,383 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-25 20:50:25,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3056286.6666666665, ans=0.0 2023-11-25 20:50:29,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3056353.3333333335, ans=0.125 2023-11-25 20:50:30,452 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1550, loss[loss=0.07398, simple_loss=0.09132, pruned_loss=0.01639, audio_tagging_loss=0.01193, over 16173.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09171, pruned_loss=0.01297, audio_tagging_loss=0.009789, over 3044584.08 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:50:50,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3056486.6666666665, ans=0.0 2023-11-25 20:50:54,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.789e+01 9.295e+01 1.002e+02 1.264e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 20:51:10,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=22.5 2023-11-25 20:51:19,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-25 20:51:24,691 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1600, loss[loss=0.07156, simple_loss=0.09612, pruned_loss=0.01474, audio_tagging_loss=0.008757, over 16978.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09154, pruned_loss=0.0129, audio_tagging_loss=0.009859, over 3046787.52 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:51:46,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3056820.0, ans=0.2 2023-11-25 20:52:04,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3056886.6666666665, ans=0.0 2023-11-25 20:52:08,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3056953.3333333335, ans=0.125 2023-11-25 20:52:12,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3056953.3333333335, ans=0.0 2023-11-25 20:52:13,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-25 20:52:18,976 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1650, loss[loss=0.07266, simple_loss=0.09392, pruned_loss=0.01282, audio_tagging_loss=0.01288, over 15127.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09129, pruned_loss=0.01294, audio_tagging_loss=0.009845, over 3052217.51 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:52:23,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3057020.0, ans=0.07 2023-11-25 20:52:30,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3057086.6666666665, ans=0.07 2023-11-25 20:52:34,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3057086.6666666665, ans=0.2 2023-11-25 20:52:38,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3057086.6666666665, ans=10.0 2023-11-25 20:52:39,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.24 vs. limit=10.0 2023-11-25 20:52:44,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.829e+01 9.450e+01 1.011e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:52:49,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3057153.3333333335, ans=0.1 2023-11-25 20:53:09,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-25 20:53:09,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3057286.6666666665, ans=0.05 2023-11-25 20:53:15,267 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1700, loss[loss=0.06638, simple_loss=0.09324, pruned_loss=0.01103, audio_tagging_loss=0.008726, over 14886.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09118, pruned_loss=0.01289, audio_tagging_loss=0.009837, over 3055619.50 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:53:46,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3057486.6666666665, ans=10.0 2023-11-25 20:53:51,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3057553.3333333335, ans=0.125 2023-11-25 20:54:05,002 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-25 20:54:06,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3057620.0, ans=0.125 2023-11-25 20:54:10,163 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1750, loss[loss=0.06842, simple_loss=0.1017, pruned_loss=0.01132, audio_tagging_loss=0.006229, over 15107.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09086, pruned_loss=0.01285, audio_tagging_loss=0.009716, over 3059032.84 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:54:12,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3057686.6666666665, ans=0.2 2023-11-25 20:54:32,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057820.0, ans=0.1 2023-11-25 20:54:36,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.599e+01 9.189e+01 9.882e+01 1.189e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-25 20:54:39,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3057820.0, ans=0.0 2023-11-25 20:54:59,274 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-25 20:55:04,394 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1800, loss[loss=0.07484, simple_loss=0.1067, pruned_loss=0.01464, audio_tagging_loss=0.006861, over 15223.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.0911, pruned_loss=0.01279, audio_tagging_loss=0.009593, over 3056005.71 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:55:04,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3058020.0, ans=0.125 2023-11-25 20:55:35,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2023-11-25 20:55:51,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3058286.6666666665, ans=0.0 2023-11-25 20:55:53,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3058286.6666666665, ans=0.0 2023-11-25 20:55:54,754 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-25 20:56:00,480 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1850, loss[loss=0.08839, simple_loss=0.1213, pruned_loss=0.02074, audio_tagging_loss=0.00701, over 15953.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09137, pruned_loss=0.01284, audio_tagging_loss=0.009523, over 3054834.08 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:56:19,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3058420.0, ans=0.125 2023-11-25 20:56:22,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-25 20:56:26,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.557e+01 9.640e+01 1.041e+02 1.665e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-25 20:56:42,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058553.3333333335, ans=0.1 2023-11-25 20:56:44,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3058620.0, ans=0.0 2023-11-25 20:56:46,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2023-11-25 20:56:49,846 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-25 20:56:55,913 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1900, loss[loss=0.0833, simple_loss=0.111, pruned_loss=0.0175, audio_tagging_loss=0.01028, over 15841.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09197, pruned_loss=0.01287, audio_tagging_loss=0.009409, over 3052741.04 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:57:10,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-11-25 20:57:44,947 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-25 20:57:50,114 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1950, loss[loss=0.06881, simple_loss=0.09625, pruned_loss=0.01268, audio_tagging_loss=0.008009, over 15877.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09166, pruned_loss=0.0128, audio_tagging_loss=0.00941, over 3051895.08 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:58:16,758 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.855e+01 9.248e+01 9.928e+01 1.852e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 20:58:16,985 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:58:39,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-25 20:58:45,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3059353.3333333335, ans=0.015 2023-11-25 20:58:45,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3059353.3333333335, ans=0.1 2023-11-25 20:58:46,029 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2000, loss[loss=0.07847, simple_loss=0.1098, pruned_loss=0.01549, audio_tagging_loss=0.008093, over 15284.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.0915, pruned_loss=0.0128, audio_tagging_loss=0.009477, over 3046649.84 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:59:19,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3059553.3333333335, ans=0.125 2023-11-25 20:59:35,132 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-25 20:59:40,298 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2050, loss[loss=0.0668, simple_loss=0.08806, pruned_loss=0.01666, audio_tagging_loss=0.006107, over 14892.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09144, pruned_loss=0.01268, audio_tagging_loss=0.009445, over 3050095.68 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:59:43,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3059686.6666666665, ans=0.125 2023-11-25 20:59:45,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-25 20:59:55,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-25 21:00:00,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3059753.3333333335, ans=0.09899494936611666 2023-11-25 21:00:08,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.675e+01 9.370e+01 9.808e+01 1.405e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:00:10,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3059820.0, ans=0.0 2023-11-25 21:00:15,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3059886.6666666665, ans=0.1 2023-11-25 21:00:23,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2023-11-25 21:00:26,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3059953.3333333335, ans=0.125 2023-11-25 21:00:29,790 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-25 21:00:32,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3059953.3333333335, ans=0.1 2023-11-25 21:00:35,278 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2100, loss[loss=0.07465, simple_loss=0.09377, pruned_loss=0.01776, audio_tagging_loss=0.01001, over 15571.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09237, pruned_loss=0.01285, audio_tagging_loss=0.009286, over 3058117.32 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:00:41,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3060020.0, ans=0.125 2023-11-25 21:01:02,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3060153.3333333335, ans=0.125 2023-11-25 21:01:22,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=22.5 2023-11-25 21:01:24,397 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-25 21:01:24,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3060286.6666666665, ans=0.125 2023-11-25 21:01:30,070 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2150, loss[loss=0.07735, simple_loss=0.1046, pruned_loss=0.01673, audio_tagging_loss=0.00831, over 16121.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09177, pruned_loss=0.01284, audio_tagging_loss=0.009313, over 3057096.27 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:01:37,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=12.0 2023-11-25 21:01:43,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3060420.0, ans=0.025 2023-11-25 21:01:48,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3060420.0, ans=0.125 2023-11-25 21:01:58,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.469e+01 9.111e+01 9.697e+01 1.371e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 21:02:03,367 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:02:03,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3060553.3333333335, ans=0.125 2023-11-25 21:02:19,651 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-25 21:02:24,876 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2200, loss[loss=0.07313, simple_loss=0.09841, pruned_loss=0.01523, audio_tagging_loss=0.008692, over 15240.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09213, pruned_loss=0.01292, audio_tagging_loss=0.009276, over 3049970.94 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:02:34,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3060753.3333333335, ans=0.125 2023-11-25 21:02:39,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3060753.3333333335, ans=0.125 2023-11-25 21:02:39,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2023-11-25 21:02:55,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3060820.0, ans=0.1 2023-11-25 21:03:01,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3060886.6666666665, ans=0.0 2023-11-25 21:03:07,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-11-25 21:03:13,463 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-25 21:03:18,636 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2250, loss[loss=0.09059, simple_loss=0.1155, pruned_loss=0.02218, audio_tagging_loss=0.01066, over 15892.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09261, pruned_loss=0.01309, audio_tagging_loss=0.00925, over 3050605.72 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:03:18,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3061020.0, ans=0.125 2023-11-25 21:03:23,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3061020.0, ans=0.125 2023-11-25 21:03:25,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2023-11-25 21:03:32,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061086.6666666665, ans=0.1 2023-11-25 21:03:47,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.734e+01 9.500e+01 1.033e+02 1.214e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-25 21:03:48,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3061153.3333333335, ans=0.125 2023-11-25 21:03:53,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3061220.0, ans=0.0 2023-11-25 21:04:01,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3061286.6666666665, ans=0.0 2023-11-25 21:04:07,699 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-25 21:04:13,874 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2300, loss[loss=0.07232, simple_loss=0.1013, pruned_loss=0.0148, audio_tagging_loss=0.006852, over 15303.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09226, pruned_loss=0.01311, audio_tagging_loss=0.009307, over 3051430.13 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:04:14,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3061353.3333333335, ans=0.125 2023-11-25 21:04:33,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-11-25 21:04:40,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3061486.6666666665, ans=0.2 2023-11-25 21:04:41,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3061486.6666666665, ans=0.0 2023-11-25 21:04:48,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061553.3333333335, ans=0.1 2023-11-25 21:04:55,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.10 vs. limit=15.0 2023-11-25 21:04:57,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2023-11-25 21:05:03,121 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:05:03,164 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-25 21:05:08,342 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2350, loss[loss=0.06033, simple_loss=0.0792, pruned_loss=0.01148, audio_tagging_loss=0.009245, over 14517.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09126, pruned_loss=0.01302, audio_tagging_loss=0.009351, over 3044866.38 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:05:08,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2023-11-25 21:05:13,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3061686.6666666665, ans=0.2 2023-11-25 21:05:19,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3061753.3333333335, ans=0.1 2023-11-25 21:05:23,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3061753.3333333335, ans=0.125 2023-11-25 21:05:34,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-25 21:05:36,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.670e+01 9.358e+01 1.017e+02 1.318e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 21:05:38,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061820.0, ans=0.1 2023-11-25 21:05:57,203 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-25 21:06:02,361 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2400, loss[loss=0.07597, simple_loss=0.1031, pruned_loss=0.01569, audio_tagging_loss=0.00876, over 15944.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09169, pruned_loss=0.01309, audio_tagging_loss=0.009415, over 3041195.58 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:06:03,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3062020.0, ans=0.025 2023-11-25 21:06:09,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-25 21:06:24,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-11-25 21:06:25,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=15.0 2023-11-25 21:06:28,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3062153.3333333335, ans=0.125 2023-11-25 21:06:28,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3062153.3333333335, ans=0.025 2023-11-25 21:06:50,891 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-25 21:06:56,593 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2450, loss[loss=0.08474, simple_loss=0.116, pruned_loss=0.02109, audio_tagging_loss=0.005664, over 15373.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09109, pruned_loss=0.01287, audio_tagging_loss=0.009514, over 3037967.70 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:07:24,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.490e+01 9.348e+01 1.014e+02 1.568e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:07:27,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3062553.3333333335, ans=0.125 2023-11-25 21:07:33,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2023-11-25 21:07:45,844 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-25 21:07:51,344 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2500, loss[loss=0.0578, simple_loss=0.07744, pruned_loss=0.009609, audio_tagging_loss=0.009468, over 15043.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09105, pruned_loss=0.01297, audio_tagging_loss=0.009604, over 3038590.20 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:07:56,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3062686.6666666665, ans=0.125 2023-11-25 21:08:01,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3062753.3333333335, ans=0.125 2023-11-25 21:08:03,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3062753.3333333335, ans=0.0 2023-11-25 21:08:09,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3062753.3333333335, ans=0.0 2023-11-25 21:08:27,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3062886.6666666665, ans=0.125 2023-11-25 21:08:39,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-25 21:08:39,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3062953.3333333335, ans=0.125 2023-11-25 21:08:44,902 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2550, loss[loss=0.05946, simple_loss=0.08634, pruned_loss=0.008124, audio_tagging_loss=0.008164, over 14600.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09048, pruned_loss=0.01286, audio_tagging_loss=0.009495, over 3041748.25 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:08:56,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2023-11-25 21:09:02,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3063086.6666666665, ans=0.1 2023-11-25 21:09:11,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3063153.3333333335, ans=0.125 2023-11-25 21:09:13,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.639e+01 9.480e+01 1.019e+02 1.523e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-25 21:09:33,257 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-25 21:09:33,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3063286.6666666665, ans=0.0 2023-11-25 21:09:38,360 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2600, loss[loss=0.07013, simple_loss=0.1032, pruned_loss=0.0111, audio_tagging_loss=0.007435, over 14974.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09078, pruned_loss=0.0129, audio_tagging_loss=0.009407, over 3047947.30 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:09:41,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3063353.3333333335, ans=0.05 2023-11-25 21:09:44,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-25 21:09:44,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3063353.3333333335, ans=0.1 2023-11-25 21:10:18,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=3063553.3333333335, ans=12.0 2023-11-25 21:10:20,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-25 21:10:28,899 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-25 21:10:30,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3063620.0, ans=0.125 2023-11-25 21:10:34,013 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2650, loss[loss=0.07087, simple_loss=0.08786, pruned_loss=0.01374, audio_tagging_loss=0.01319, over 15265.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09, pruned_loss=0.0128, audio_tagging_loss=0.00937, over 3041780.68 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:10:42,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3063686.6666666665, ans=0.125 2023-11-25 21:10:47,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2023-11-25 21:10:47,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3063753.3333333335, ans=0.125 2023-11-25 21:10:54,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3063820.0, ans=0.2 2023-11-25 21:11:01,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.401e+01 9.228e+01 9.795e+01 1.294e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:11:02,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3063820.0, ans=0.2 2023-11-25 21:11:12,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-25 21:11:20,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3063953.3333333335, ans=0.125 2023-11-25 21:11:20,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3063953.3333333335, ans=0.2 2023-11-25 21:11:22,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-25 21:11:28,314 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2700, loss[loss=0.0651, simple_loss=0.08159, pruned_loss=0.01667, audio_tagging_loss=0.007635, over 15049.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.08952, pruned_loss=0.01265, audio_tagging_loss=0.009341, over 3046149.91 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:11:30,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3064020.0, ans=0.125 2023-11-25 21:11:33,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3064020.0, ans=0.0 2023-11-25 21:11:35,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3064020.0, ans=0.1 2023-11-25 21:12:16,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-25 21:12:21,808 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2750, loss[loss=0.09082, simple_loss=0.1116, pruned_loss=0.02513, audio_tagging_loss=0.009883, over 14945.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.08987, pruned_loss=0.01257, audio_tagging_loss=0.009306, over 3046542.82 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:12:24,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3064353.3333333335, ans=0.125 2023-11-25 21:12:32,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3064420.0, ans=0.2 2023-11-25 21:12:41,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3064420.0, ans=0.125 2023-11-25 21:12:42,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3064420.0, ans=0.125 2023-11-25 21:12:45,692 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:12:50,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.675e+01 9.044e+01 9.844e+01 1.238e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-25 21:13:08,990 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:13:11,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-25 21:13:15,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=15.0 2023-11-25 21:13:17,191 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2800, loss[loss=0.07023, simple_loss=0.08979, pruned_loss=0.01413, audio_tagging_loss=0.0112, over 15467.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09018, pruned_loss=0.01269, audio_tagging_loss=0.009272, over 3049581.12 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:13:32,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-25 21:13:33,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3064753.3333333335, ans=0.05 2023-11-25 21:13:41,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3064820.0, ans=0.125 2023-11-25 21:13:43,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3064820.0, ans=0.1 2023-11-25 21:13:48,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-25 21:13:51,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-25 21:14:06,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-25 21:14:11,863 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2850, loss[loss=0.07989, simple_loss=0.108, pruned_loss=0.01982, audio_tagging_loss=0.006078, over 14338.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.08988, pruned_loss=0.01258, audio_tagging_loss=0.009146, over 3049344.18 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:14:24,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065086.6666666665, ans=0.1 2023-11-25 21:14:30,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2023-11-25 21:14:31,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3065086.6666666665, ans=0.125 2023-11-25 21:14:39,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.595e+01 9.142e+01 9.903e+01 1.163e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-25 21:14:51,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065220.0, ans=0.1 2023-11-25 21:14:59,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3065286.6666666665, ans=0.1 2023-11-25 21:15:00,263 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-25 21:15:03,949 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:15:05,818 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2900, loss[loss=0.07669, simple_loss=0.09488, pruned_loss=0.02021, audio_tagging_loss=0.009042, over 15424.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0901, pruned_loss=0.01278, audio_tagging_loss=0.009142, over 3043870.83 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:15:32,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3065486.6666666665, ans=0.125 2023-11-25 21:15:33,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2023-11-25 21:15:38,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3065553.3333333335, ans=0.125 2023-11-25 21:15:54,715 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-25 21:15:59,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3065686.6666666665, ans=0.0 2023-11-25 21:16:00,338 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2950, loss[loss=0.07412, simple_loss=0.1003, pruned_loss=0.01525, audio_tagging_loss=0.008716, over 16031.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08897, pruned_loss=0.01275, audio_tagging_loss=0.009327, over 3036076.93 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:06,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3065686.6666666665, ans=0.0 2023-11-25 21:16:12,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2023-11-25 21:16:14,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-25 21:16:17,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-25 21:16:19,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3065753.3333333335, ans=0.2 2023-11-25 21:16:24,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3065820.0, ans=15.0 2023-11-25 21:16:27,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.766e+01 9.471e+01 1.038e+02 1.516e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:16:31,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3065886.6666666665, ans=0.0 2023-11-25 21:16:35,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3065886.6666666665, ans=0.2 2023-11-25 21:16:37,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-25 21:16:42,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3065953.3333333335, ans=0.0 2023-11-25 21:16:45,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065953.3333333335, ans=0.1 2023-11-25 21:16:47,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-25 21:16:49,095 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-25 21:16:54,792 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3000, loss[loss=0.0603, simple_loss=0.07661, pruned_loss=0.01148, audio_tagging_loss=0.01051, over 14925.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.08978, pruned_loss=0.01266, audio_tagging_loss=0.009338, over 3036893.60 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:54,793 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-25 21:17:14,390 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4354, 3.6845, 2.9285, 3.9504], device='cuda:1') 2023-11-25 21:17:26,410 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05939, simple_loss=0.05076, pruned_loss=0.005254, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-25 21:17:26,411 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-25 21:17:34,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.56 vs. limit=22.5 2023-11-25 21:17:44,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2023-11-25 21:17:46,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3066086.6666666665, ans=0.125 2023-11-25 21:18:09,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066286.6666666665, ans=0.1 2023-11-25 21:18:12,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3066286.6666666665, ans=0.125 2023-11-25 21:18:14,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-25 21:18:16,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-25 21:18:21,895 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3050, loss[loss=0.05783, simple_loss=0.07002, pruned_loss=0.01042, audio_tagging_loss=0.0124, over 13676.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.08986, pruned_loss=0.01281, audio_tagging_loss=0.009401, over 3030694.82 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:18:49,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.558e+01 9.280e+01 1.009e+02 1.298e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 21:18:53,065 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:18:53,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3066553.3333333335, ans=0.0 2023-11-25 21:18:55,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-25 21:19:11,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-25 21:19:19,032 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3100, loss[loss=0.07665, simple_loss=0.1003, pruned_loss=0.01436, audio_tagging_loss=0.01216, over 13366.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.08967, pruned_loss=0.0127, audio_tagging_loss=0.009439, over 3037789.34 frames. ], batch size: 51, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:19:30,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3066753.3333333335, ans=0.125 2023-11-25 21:19:36,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2023-11-25 21:19:41,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.22 vs. limit=10.0 2023-11-25 21:20:01,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3066953.3333333335, ans=0.1 2023-11-25 21:20:07,922 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-25 21:20:13,164 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3150, loss[loss=0.07702, simple_loss=0.1001, pruned_loss=0.01718, audio_tagging_loss=0.009774, over 15003.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09027, pruned_loss=0.01278, audio_tagging_loss=0.00953, over 3039435.09 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:20:42,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.732e+01 9.474e+01 1.004e+02 1.246e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:20:44,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2023-11-25 21:21:03,025 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-25 21:21:09,291 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3200, loss[loss=0.0484, simple_loss=0.06166, pruned_loss=0.007008, audio_tagging_loss=0.01056, over 13748.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09064, pruned_loss=0.0129, audio_tagging_loss=0.009506, over 3039756.95 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:21:11,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3067353.3333333335, ans=0.2 2023-11-25 21:21:16,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3067353.3333333335, ans=0.125 2023-11-25 21:21:20,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-25 21:21:23,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2023-11-25 21:21:36,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-25 21:21:51,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-25 21:21:53,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3067620.0, ans=0.0 2023-11-25 21:21:57,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3067620.0, ans=0.125 2023-11-25 21:21:58,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-25 21:22:03,793 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3250, loss[loss=0.05174, simple_loss=0.06663, pruned_loss=0.009317, audio_tagging_loss=0.009112, over 15283.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09032, pruned_loss=0.01268, audio_tagging_loss=0.009609, over 3037821.81 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:22:06,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3067686.6666666665, ans=0.125 2023-11-25 21:22:20,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3067753.3333333335, ans=0.125 2023-11-25 21:22:22,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3067753.3333333335, ans=0.0 2023-11-25 21:22:24,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3067820.0, ans=0.125 2023-11-25 21:22:32,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.616e+01 9.104e+01 1.013e+02 1.269e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-25 21:22:36,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3067886.6666666665, ans=0.125 2023-11-25 21:22:42,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3067886.6666666665, ans=0.0 2023-11-25 21:22:44,910 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:22:53,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-25 21:22:53,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3067953.3333333335, ans=0.125 2023-11-25 21:22:59,149 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3300, loss[loss=0.07682, simple_loss=0.1053, pruned_loss=0.01345, audio_tagging_loss=0.01072, over 14210.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09097, pruned_loss=0.01291, audio_tagging_loss=0.009672, over 3036567.12 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:23:00,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3068020.0, ans=0.125 2023-11-25 21:23:07,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3068020.0, ans=0.05 2023-11-25 21:23:16,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-11-25 21:23:17,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3068086.6666666665, ans=0.1 2023-11-25 21:23:21,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-25 21:23:41,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3068220.0, ans=10.0 2023-11-25 21:23:44,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3068286.6666666665, ans=0.1 2023-11-25 21:23:48,837 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-25 21:23:54,391 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3350, loss[loss=0.07347, simple_loss=0.09644, pruned_loss=0.0154, audio_tagging_loss=0.00985, over 16000.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09053, pruned_loss=0.01294, audio_tagging_loss=0.009539, over 3036397.52 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:23:57,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3068353.3333333335, ans=0.125 2023-11-25 21:23:58,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3068353.3333333335, ans=0.1 2023-11-25 21:24:02,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-25 21:24:22,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.673e+01 9.369e+01 1.012e+02 1.333e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:24:29,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3068553.3333333335, ans=0.0 2023-11-25 21:24:32,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.60 vs. limit=12.0 2023-11-25 21:24:44,753 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-25 21:24:45,942 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:24:49,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-25 21:24:50,011 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3400, loss[loss=0.0615, simple_loss=0.08024, pruned_loss=0.01321, audio_tagging_loss=0.008177, over 14267.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.0905, pruned_loss=0.01289, audio_tagging_loss=0.009327, over 3032161.81 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:24:58,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-11-25 21:24:59,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3068753.3333333335, ans=0.0 2023-11-25 21:25:12,169 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:25:12,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-25 21:25:15,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3068820.0, ans=0.2 2023-11-25 21:25:22,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3068886.6666666665, ans=0.1 2023-11-25 21:25:27,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2023-11-25 21:25:39,049 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-25 21:25:44,246 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3450, loss[loss=0.05936, simple_loss=0.07688, pruned_loss=0.01481, audio_tagging_loss=0.006116, over 14240.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09053, pruned_loss=0.01288, audio_tagging_loss=0.009259, over 3028008.11 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:25:45,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3069020.0, ans=0.5 2023-11-25 21:25:45,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2023-11-25 21:25:51,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3069020.0, ans=0.025 2023-11-25 21:25:56,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3069086.6666666665, ans=0.125 2023-11-25 21:25:59,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3069086.6666666665, ans=0.125 2023-11-25 21:26:13,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.826e+01 9.469e+01 1.006e+02 1.325e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:26:17,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069220.0, ans=0.1 2023-11-25 21:26:22,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3069220.0, ans=0.125 2023-11-25 21:26:27,730 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:26:34,490 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-25 21:26:40,241 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3500, loss[loss=0.07461, simple_loss=0.101, pruned_loss=0.0142, audio_tagging_loss=0.009894, over 16192.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.08993, pruned_loss=0.0128, audio_tagging_loss=0.009191, over 3026824.11 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:26:46,657 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:27:07,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3069486.6666666665, ans=0.2 2023-11-25 21:27:08,725 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:27:11,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3069486.6666666665, ans=0.125 2023-11-25 21:27:30,899 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-25 21:27:36,119 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3550, loss[loss=0.07463, simple_loss=0.0987, pruned_loss=0.01547, audio_tagging_loss=0.009815, over 14953.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.08997, pruned_loss=0.0128, audio_tagging_loss=0.00914, over 3031148.27 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:27:56,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3069820.0, ans=0.125 2023-11-25 21:27:56,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3069820.0, ans=0.125 2023-11-25 21:28:03,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3069820.0, ans=0.0 2023-11-25 21:28:05,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.535e+01 9.230e+01 9.860e+01 1.398e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:28:09,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2023-11-25 21:28:14,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-25 21:28:17,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3069886.6666666665, ans=0.2 2023-11-25 21:28:23,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3069953.3333333335, ans=0.1 2023-11-25 21:28:24,819 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-25 21:28:30,008 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3600, loss[loss=0.07582, simple_loss=0.1075, pruned_loss=0.01558, audio_tagging_loss=0.006516, over 15631.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09049, pruned_loss=0.01298, audio_tagging_loss=0.009084, over 3035092.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:28:56,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3070153.3333333335, ans=0.0 2023-11-25 21:29:05,638 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:29:06,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3070220.0, ans=0.1 2023-11-25 21:29:19,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-25 21:29:24,872 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3650, loss[loss=0.05033, simple_loss=0.06796, pruned_loss=0.007669, audio_tagging_loss=0.008682, over 15627.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09018, pruned_loss=0.01297, audio_tagging_loss=0.009068, over 3032975.08 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:29:25,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-25 21:29:32,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3070353.3333333335, ans=0.125 2023-11-25 21:29:36,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-25 21:29:37,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-25 21:29:39,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-25 21:29:39,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3070420.0, ans=0.125 2023-11-25 21:29:49,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3070486.6666666665, ans=0.125 2023-11-25 21:29:54,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.628e+01 9.158e+01 1.002e+02 1.364e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-25 21:30:03,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3070553.3333333335, ans=0.125 2023-11-25 21:30:15,076 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-25 21:30:16,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3070620.0, ans=0.0 2023-11-25 21:30:20,511 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3700, loss[loss=0.04375, simple_loss=0.05182, pruned_loss=0.01059, audio_tagging_loss=0.007252, over 14993.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09047, pruned_loss=0.01297, audio_tagging_loss=0.009075, over 3041788.03 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:30:34,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3070753.3333333335, ans=0.0 2023-11-25 21:30:39,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3070753.3333333335, ans=0.0 2023-11-25 21:30:41,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3070820.0, ans=0.125 2023-11-25 21:31:09,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2023-11-25 21:31:09,772 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-25 21:31:14,934 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3750, loss[loss=0.07529, simple_loss=0.1025, pruned_loss=0.01547, audio_tagging_loss=0.008564, over 15367.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09068, pruned_loss=0.01317, audio_tagging_loss=0.009187, over 3044270.74 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:31:25,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3071086.6666666665, ans=0.0 2023-11-25 21:31:45,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-25 21:31:45,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.797e+01 9.429e+01 1.022e+02 1.345e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-25 21:31:47,784 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:31:53,788 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:31:54,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3071220.0, ans=0.0 2023-11-25 21:32:04,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-25 21:32:09,374 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3800, loss[loss=0.06245, simple_loss=0.07608, pruned_loss=0.0135, audio_tagging_loss=0.01091, over 14714.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.08995, pruned_loss=0.0129, audio_tagging_loss=0.009277, over 3050383.81 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:32:09,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3071353.3333333335, ans=0.05 2023-11-25 21:32:15,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3071353.3333333335, ans=0.125 2023-11-25 21:32:37,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3071486.6666666665, ans=0.5 2023-11-25 21:32:45,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3071553.3333333335, ans=0.125 2023-11-25 21:32:54,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3071620.0, ans=0.025 2023-11-25 21:32:59,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-25 21:33:02,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3071620.0, ans=0.125 2023-11-25 21:33:04,716 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:33:05,935 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3850, loss[loss=0.06265, simple_loss=0.08268, pruned_loss=0.01113, audio_tagging_loss=0.01017, over 16184.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.0912, pruned_loss=0.01304, audio_tagging_loss=0.00932, over 3053680.26 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:33:30,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071820.0, ans=0.1 2023-11-25 21:33:34,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.515e+01 9.072e+01 9.640e+01 1.260e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 21:33:34,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3071820.0, ans=0.0 2023-11-25 21:33:35,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3071820.0, ans=0.025 2023-11-25 21:33:42,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3071886.6666666665, ans=0.025 2023-11-25 21:33:50,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3071953.3333333335, ans=0.0 2023-11-25 21:33:55,451 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-25 21:34:00,957 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3900, loss[loss=0.08646, simple_loss=0.1225, pruned_loss=0.01859, audio_tagging_loss=0.00663, over 14392.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.0912, pruned_loss=0.01299, audio_tagging_loss=0.009252, over 3045274.77 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:34:13,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3072086.6666666665, ans=0.1 2023-11-25 21:34:41,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3072220.0, ans=0.0 2023-11-25 21:34:44,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-25 21:34:46,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3072286.6666666665, ans=0.025 2023-11-25 21:34:50,186 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-25 21:34:51,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3072286.6666666665, ans=0.125 2023-11-25 21:34:55,275 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3950, loss[loss=0.06316, simple_loss=0.08909, pruned_loss=0.01051, audio_tagging_loss=0.008108, over 16571.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09104, pruned_loss=0.01295, audio_tagging_loss=0.009312, over 3044487.05 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:34:59,755 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:35:04,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3072353.3333333335, ans=0.0 2023-11-25 21:35:08,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3072420.0, ans=0.125 2023-11-25 21:35:14,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-11-25 21:35:17,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3072486.6666666665, ans=0.0 2023-11-25 21:35:26,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.593e+01 9.164e+01 9.900e+01 1.243e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-25 21:35:28,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-25 21:35:31,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3072553.3333333335, ans=0.0 2023-11-25 21:35:45,356 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-25 21:35:51,039 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4000, loss[loss=0.05593, simple_loss=0.06564, pruned_loss=0.009989, audio_tagging_loss=0.01312, over 14310.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.091, pruned_loss=0.01299, audio_tagging_loss=0.009384, over 3042459.71 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:36:40,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-25 21:36:40,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3072953.3333333335, ans=0.125 2023-11-25 21:36:42,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3072953.3333333335, ans=0.125 2023-11-25 21:36:46,398 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4050, loss[loss=0.06364, simple_loss=0.08548, pruned_loss=0.01135, audio_tagging_loss=0.009548, over 14602.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09165, pruned_loss=0.01309, audio_tagging_loss=0.009369, over 3048776.88 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:36:50,625 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:37:15,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3073153.3333333335, ans=0.0 2023-11-25 21:37:15,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3073153.3333333335, ans=0.95 2023-11-25 21:37:18,686 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:37:19,531 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.880e+01 9.594e+01 1.042e+02 1.593e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-25 21:37:26,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-25 21:37:35,834 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-25 21:37:41,380 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4100, loss[loss=0.05458, simple_loss=0.06672, pruned_loss=0.007325, audio_tagging_loss=0.0139, over 13733.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09173, pruned_loss=0.01323, audio_tagging_loss=0.00944, over 3045147.16 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:37:46,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3073353.3333333335, ans=0.1 2023-11-25 21:37:51,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3073420.0, ans=0.0 2023-11-25 21:37:54,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-25 21:38:20,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3073553.3333333335, ans=0.125 2023-11-25 21:38:23,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3073553.3333333335, ans=0.0 2023-11-25 21:38:31,274 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-25 21:38:36,935 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4150, loss[loss=0.0617, simple_loss=0.08234, pruned_loss=0.009376, audio_tagging_loss=0.01115, over 15309.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.0916, pruned_loss=0.01308, audio_tagging_loss=0.009323, over 3045429.85 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:38:37,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3073686.6666666665, ans=0.125 2023-11-25 21:38:51,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3073753.3333333335, ans=0.0 2023-11-25 21:39:00,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:09,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.760e+01 9.274e+01 9.766e+01 1.268e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 21:39:11,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3073886.6666666665, ans=0.0 2023-11-25 21:39:17,901 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:39:19,148 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:39:26,825 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-25 21:39:28,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.42 vs. limit=10.0 2023-11-25 21:39:32,006 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4200, loss[loss=0.06111, simple_loss=0.07787, pruned_loss=0.01401, audio_tagging_loss=0.008165, over 14817.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09149, pruned_loss=0.01286, audio_tagging_loss=0.009254, over 3043075.67 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:39:42,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3074086.6666666665, ans=0.2 2023-11-25 21:39:57,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3074153.3333333335, ans=0.07 2023-11-25 21:39:59,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3074153.3333333335, ans=0.125 2023-11-25 21:40:06,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-25 21:40:06,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3074220.0, ans=0.04949747468305833 2023-11-25 21:40:21,664 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-25 21:40:27,301 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4250, loss[loss=0.06555, simple_loss=0.08518, pruned_loss=0.01489, audio_tagging_loss=0.008075, over 15111.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09151, pruned_loss=0.0128, audio_tagging_loss=0.009172, over 3044045.17 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:40:27,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3074353.3333333335, ans=0.125 2023-11-25 21:40:31,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3074353.3333333335, ans=0.1 2023-11-25 21:41:00,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.649e+01 9.520e+01 1.007e+02 1.325e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-25 21:41:04,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3074553.3333333335, ans=0.125 2023-11-25 21:41:08,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.54 vs. limit=15.0 2023-11-25 21:41:13,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3074620.0, ans=0.2 2023-11-25 21:41:13,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=12.0 2023-11-25 21:41:16,485 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-25 21:41:18,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074620.0, ans=0.125 2023-11-25 21:41:22,525 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4300, loss[loss=0.07254, simple_loss=0.1034, pruned_loss=0.01352, audio_tagging_loss=0.007317, over 14575.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09166, pruned_loss=0.01285, audio_tagging_loss=0.008941, over 3046548.78 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:41:25,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3074686.6666666665, ans=0.0 2023-11-25 21:41:31,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3074686.6666666665, ans=0.0 2023-11-25 21:41:39,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3074753.3333333335, ans=0.2 2023-11-25 21:41:59,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3074886.6666666665, ans=0.05 2023-11-25 21:42:13,025 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-25 21:42:15,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3074953.3333333335, ans=0.2 2023-11-25 21:42:18,144 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4350, loss[loss=0.06933, simple_loss=0.09833, pruned_loss=0.01138, audio_tagging_loss=0.008789, over 15013.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09162, pruned_loss=0.01287, audio_tagging_loss=0.008866, over 3038144.50 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:42:51,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.819e+01 9.341e+01 1.009e+02 3.956e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-25 21:42:51,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3075220.0, ans=0.125 2023-11-25 21:42:55,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3075220.0, ans=0.125 2023-11-25 21:42:57,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3075220.0, ans=0.125 2023-11-25 21:43:07,472 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-25 21:43:12,644 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4400, loss[loss=0.07981, simple_loss=0.1137, pruned_loss=0.01494, audio_tagging_loss=0.008022, over 15271.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09121, pruned_loss=0.01284, audio_tagging_loss=0.008994, over 3044300.84 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:43:43,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3075486.6666666665, ans=0.0 2023-11-25 21:43:51,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3075553.3333333335, ans=0.0 2023-11-25 21:43:54,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3075553.3333333335, ans=0.07 2023-11-25 21:43:57,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2023-11-25 21:44:02,309 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-25 21:44:02,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-25 21:44:07,977 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4450, loss[loss=0.09522, simple_loss=0.1261, pruned_loss=0.024, audio_tagging_loss=0.008158, over 15112.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09131, pruned_loss=0.01285, audio_tagging_loss=0.00889, over 3050972.45 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:44:22,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3075753.3333333335, ans=0.2 2023-11-25 21:44:36,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3075820.0, ans=0.2 2023-11-25 21:44:41,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.665e+01 9.390e+01 1.023e+02 1.193e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-25 21:44:45,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3075886.6666666665, ans=0.125 2023-11-25 21:44:56,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-11-25 21:44:57,523 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-25 21:45:03,431 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4500, loss[loss=0.06662, simple_loss=0.08988, pruned_loss=0.009973, audio_tagging_loss=0.01171, over 15142.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09144, pruned_loss=0.0129, audio_tagging_loss=0.008856, over 3052986.67 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:45:03,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3076020.0, ans=0.0 2023-11-25 21:45:16,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-25 21:45:27,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3076153.3333333335, ans=0.125 2023-11-25 21:45:46,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3076286.6666666665, ans=0.125 2023-11-25 21:45:49,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3076286.6666666665, ans=0.1 2023-11-25 21:45:52,287 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-25 21:45:57,459 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4550, loss[loss=0.04878, simple_loss=0.06319, pruned_loss=0.007263, audio_tagging_loss=0.009921, over 14842.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.0912, pruned_loss=0.01305, audio_tagging_loss=0.00896, over 3047983.22 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:46:13,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3076420.0, ans=0.09899494936611666 2023-11-25 21:46:20,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-25 21:46:21,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3076486.6666666665, ans=0.125 2023-11-25 21:46:30,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-25 21:46:30,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3076553.3333333335, ans=0.125 2023-11-25 21:46:31,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.489e+01 8.972e+01 9.819e+01 1.195e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-25 21:46:35,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3076553.3333333335, ans=0.0 2023-11-25 21:46:39,172 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:46:40,038 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:46:46,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-25 21:46:51,812 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4600, loss[loss=0.05859, simple_loss=0.07885, pruned_loss=0.01099, audio_tagging_loss=0.00817, over 13938.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09086, pruned_loss=0.01294, audio_tagging_loss=0.00901, over 3045889.13 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:47:08,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=22.5 2023-11-25 21:47:25,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3076886.6666666665, ans=0.09899494936611666 2023-11-25 21:47:26,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3076886.6666666665, ans=0.0 2023-11-25 21:47:33,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3076886.6666666665, ans=0.0 2023-11-25 21:47:34,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3076953.3333333335, ans=0.0 2023-11-25 21:47:41,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-25 21:47:43,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3076953.3333333335, ans=15.0 2023-11-25 21:47:47,364 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4650, loss[loss=0.06734, simple_loss=0.09519, pruned_loss=0.01296, audio_tagging_loss=0.006784, over 15544.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09009, pruned_loss=0.01286, audio_tagging_loss=0.009132, over 3047517.50 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:48:01,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-25 21:48:10,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3077153.3333333335, ans=0.1 2023-11-25 21:48:20,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.66 vs. limit=22.5 2023-11-25 21:48:20,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.563e+01 9.172e+01 1.006e+02 1.160e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 21:48:36,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-25 21:48:41,724 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4700, loss[loss=0.05464, simple_loss=0.07101, pruned_loss=0.007262, audio_tagging_loss=0.01188, over 15505.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09089, pruned_loss=0.01304, audio_tagging_loss=0.009268, over 3046522.66 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:48:44,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3077353.3333333335, ans=0.2 2023-11-25 21:48:46,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3077353.3333333335, ans=0.0 2023-11-25 21:49:18,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3077553.3333333335, ans=0.125 2023-11-25 21:49:29,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-25 21:49:35,063 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4750, loss[loss=0.08438, simple_loss=0.1135, pruned_loss=0.01639, audio_tagging_loss=0.01125, over 15408.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09058, pruned_loss=0.01282, audio_tagging_loss=0.009352, over 3048244.54 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:49:42,572 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:50:09,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.924e+01 9.307e+01 1.025e+02 1.203e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 21:50:16,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-25 21:50:24,996 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-25 21:50:28,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3077953.3333333335, ans=0.0 2023-11-25 21:50:30,548 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4800, loss[loss=0.06902, simple_loss=0.09131, pruned_loss=0.01131, audio_tagging_loss=0.01206, over 15165.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09063, pruned_loss=0.0128, audio_tagging_loss=0.009443, over 3047928.42 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:50:39,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3078020.0, ans=0.125 2023-11-25 21:50:44,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-11-25 21:50:47,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-11-25 21:50:56,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-11-25 21:51:13,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3078286.6666666665, ans=0.0 2023-11-25 21:51:17,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3078286.6666666665, ans=0.1 2023-11-25 21:51:19,478 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-25 21:51:24,559 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4850, loss[loss=0.05971, simple_loss=0.07898, pruned_loss=0.01, audio_tagging_loss=0.01021, over 15719.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09018, pruned_loss=0.01273, audio_tagging_loss=0.009525, over 3043500.68 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:51:24,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3078353.3333333335, ans=0.0 2023-11-25 21:51:27,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3078353.3333333335, ans=0.125 2023-11-25 21:51:38,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2023-11-25 21:51:49,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-25 21:51:53,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-11-25 21:51:54,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3078486.6666666665, ans=0.0 2023-11-25 21:51:58,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.740e+01 9.474e+01 1.031e+02 1.193e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:52:12,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-25 21:52:18,110 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4900, loss[loss=0.06953, simple_loss=0.08769, pruned_loss=0.01576, audio_tagging_loss=0.009927, over 15453.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09098, pruned_loss=0.01283, audio_tagging_loss=0.009461, over 3044354.33 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:52:27,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3078686.6666666665, ans=0.0 2023-11-25 21:52:47,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-25 21:53:07,347 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-25 21:53:12,995 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4950, loss[loss=0.08914, simple_loss=0.1242, pruned_loss=0.02014, audio_tagging_loss=0.006887, over 15858.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09098, pruned_loss=0.01276, audio_tagging_loss=0.009227, over 3043132.16 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:53:13,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3079020.0, ans=0.0 2023-11-25 21:53:21,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3079020.0, ans=0.2 2023-11-25 21:53:46,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.693e+01 9.293e+01 9.943e+01 1.246e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 21:53:46,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079220.0, ans=0.1 2023-11-25 21:53:51,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3079220.0, ans=0.035 2023-11-25 21:54:02,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-25 21:54:05,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3079286.6666666665, ans=0.125 2023-11-25 21:54:07,911 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5000, loss[loss=0.06405, simple_loss=0.08068, pruned_loss=0.0135, audio_tagging_loss=0.01021, over 15335.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09149, pruned_loss=0.01291, audio_tagging_loss=0.009047, over 3048952.98 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:54:10,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3079353.3333333335, ans=0.1 2023-11-25 21:54:11,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.56 vs. limit=15.0 2023-11-25 21:54:49,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3079553.3333333335, ans=0.125 2023-11-25 21:54:50,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3079620.0, ans=0.125 2023-11-25 21:54:55,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3079620.0, ans=0.125 2023-11-25 21:54:56,516 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-25 21:55:01,799 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5050, loss[loss=0.03604, simple_loss=0.04387, pruned_loss=0.003966, audio_tagging_loss=0.01014, over 16280.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09109, pruned_loss=0.01275, audio_tagging_loss=0.009093, over 3045347.10 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:55:10,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3079686.6666666665, ans=0.0 2023-11-25 21:55:27,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3079820.0, ans=0.0 2023-11-25 21:55:28,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3079820.0, ans=0.125 2023-11-25 21:55:35,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2023-11-25 21:55:35,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.539e+01 9.066e+01 9.676e+01 1.144e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-25 21:55:36,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3079886.6666666665, ans=0.125 2023-11-25 21:55:40,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3079886.6666666665, ans=0.0 2023-11-25 21:55:46,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3079953.3333333335, ans=0.0 2023-11-25 21:55:50,297 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-25 21:55:56,009 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5100, loss[loss=0.08288, simple_loss=0.1133, pruned_loss=0.01821, audio_tagging_loss=0.008009, over 15189.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09055, pruned_loss=0.01265, audio_tagging_loss=0.009086, over 3044041.58 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:56:06,652 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:56:07,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3080086.6666666665, ans=0.2 2023-11-25 21:56:13,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3080086.6666666665, ans=0.0 2023-11-25 21:56:18,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3080153.3333333335, ans=0.2 2023-11-25 21:56:20,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3080153.3333333335, ans=0.125 2023-11-25 21:56:35,598 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:56:45,792 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-25 21:56:51,455 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5150, loss[loss=0.09516, simple_loss=0.1325, pruned_loss=0.02197, audio_tagging_loss=0.006917, over 16450.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09053, pruned_loss=0.01268, audio_tagging_loss=0.009102, over 3045432.55 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:56:54,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3080353.3333333335, ans=0.0 2023-11-25 21:56:54,783 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:57:04,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.05 vs. limit=6.0 2023-11-25 21:57:05,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2023-11-25 21:57:25,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.763e+01 9.349e+01 9.902e+01 1.210e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:57:28,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-25 21:57:29,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-25 21:57:30,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3080553.3333333335, ans=0.0 2023-11-25 21:57:40,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-25 21:57:45,285 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5200, loss[loss=0.07318, simple_loss=0.1001, pruned_loss=0.01404, audio_tagging_loss=0.009102, over 15917.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09122, pruned_loss=0.01278, audio_tagging_loss=0.009059, over 3039740.74 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:57:49,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3080686.6666666665, ans=0.0 2023-11-25 21:57:58,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3080753.3333333335, ans=0.0 2023-11-25 21:58:04,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3080753.3333333335, ans=0.2 2023-11-25 21:58:12,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3080820.0, ans=0.0 2023-11-25 21:58:18,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3080886.6666666665, ans=0.125 2023-11-25 21:58:25,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-11-25 21:58:33,957 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-25 21:58:34,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-25 21:58:39,643 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5250, loss[loss=0.07375, simple_loss=0.09409, pruned_loss=0.01547, audio_tagging_loss=0.01123, over 14568.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09101, pruned_loss=0.01287, audio_tagging_loss=0.009028, over 3041026.58 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:58:40,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2023-11-25 21:58:43,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3081020.0, ans=0.125 2023-11-25 21:58:48,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3081020.0, ans=0.125 2023-11-25 21:59:14,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.546e+01 9.251e+01 9.912e+01 1.159e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 21:59:29,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-25 21:59:30,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-25 21:59:34,794 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5300, loss[loss=0.06783, simple_loss=0.08728, pruned_loss=0.01152, audio_tagging_loss=0.01266, over 14649.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09113, pruned_loss=0.01291, audio_tagging_loss=0.009009, over 3044084.28 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:59:36,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3081353.3333333335, ans=0.0 2023-11-25 21:59:43,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3081353.3333333335, ans=0.125 2023-11-25 21:59:55,275 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:59:57,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3081486.6666666665, ans=0.125 2023-11-25 22:00:05,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3081486.6666666665, ans=0.0 2023-11-25 22:00:08,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3081553.3333333335, ans=0.0 2023-11-25 22:00:20,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3081620.0, ans=0.125 2023-11-25 22:00:23,664 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-25 22:00:29,314 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5350, loss[loss=0.08573, simple_loss=0.1122, pruned_loss=0.01903, audio_tagging_loss=0.01059, over 14593.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09143, pruned_loss=0.01288, audio_tagging_loss=0.009014, over 3046437.36 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:00:30,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3081686.6666666665, ans=0.0 2023-11-25 22:00:41,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3081753.3333333335, ans=0.1 2023-11-25 22:00:47,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3081753.3333333335, ans=0.125 2023-11-25 22:00:50,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3081820.0, ans=0.125 2023-11-25 22:01:01,156 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:01:02,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3081886.6666666665, ans=0.0 2023-11-25 22:01:04,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.802e+01 9.245e+01 1.006e+02 1.324e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 22:01:18,108 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-25 22:01:23,236 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5400, loss[loss=0.0994, simple_loss=0.1224, pruned_loss=0.0293, audio_tagging_loss=0.008894, over 14460.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09155, pruned_loss=0.01285, audio_tagging_loss=0.008921, over 3042040.07 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:01:41,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3082086.6666666665, ans=0.0 2023-11-25 22:02:02,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2023-11-25 22:02:09,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2023-11-25 22:02:11,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3082286.6666666665, ans=10.0 2023-11-25 22:02:13,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-25 22:02:13,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3082286.6666666665, ans=0.2 2023-11-25 22:02:18,826 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5450, loss[loss=0.06519, simple_loss=0.0808, pruned_loss=0.01448, audio_tagging_loss=0.01031, over 15462.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09126, pruned_loss=0.01294, audio_tagging_loss=0.009, over 3037699.56 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:02:22,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3082353.3333333335, ans=0.125 2023-11-25 22:02:35,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3082420.0, ans=0.0 2023-11-25 22:02:37,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3082420.0, ans=0.1 2023-11-25 22:02:52,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3082553.3333333335, ans=0.125 2023-11-25 22:02:53,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.739e+01 9.442e+01 1.018e+02 1.459e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 22:02:53,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3082553.3333333335, ans=0.0 2023-11-25 22:02:56,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3082553.3333333335, ans=0.0 2023-11-25 22:03:05,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3082620.0, ans=0.125 2023-11-25 22:03:07,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-25 22:03:12,951 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5500, loss[loss=0.07376, simple_loss=0.09482, pruned_loss=0.0149, audio_tagging_loss=0.01145, over 15787.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09145, pruned_loss=0.01297, audio_tagging_loss=0.008976, over 3040945.89 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:03:20,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-25 22:03:20,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-25 22:03:33,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3082820.0, ans=0.1 2023-11-25 22:03:57,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3082953.3333333335, ans=0.0 2023-11-25 22:04:02,248 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-25 22:04:07,412 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5550, loss[loss=0.0673, simple_loss=0.09129, pruned_loss=0.01181, audio_tagging_loss=0.009839, over 15313.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09037, pruned_loss=0.01271, audio_tagging_loss=0.00923, over 3036754.25 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:04:09,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3083020.0, ans=0.0 2023-11-25 22:04:42,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.648e+01 9.293e+01 9.970e+01 1.288e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:04:56,910 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-25 22:05:03,009 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5600, loss[loss=0.0829, simple_loss=0.1203, pruned_loss=0.01565, audio_tagging_loss=0.007098, over 16163.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09179, pruned_loss=0.01297, audio_tagging_loss=0.009279, over 3046945.23 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:05:31,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2023-11-25 22:05:36,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3083553.3333333335, ans=0.0 2023-11-25 22:05:43,054 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:05:51,816 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-25 22:05:54,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3083620.0, ans=0.0 2023-11-25 22:05:56,941 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5650, loss[loss=0.05079, simple_loss=0.07092, pruned_loss=0.006845, audio_tagging_loss=0.008483, over 15254.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09239, pruned_loss=0.01306, audio_tagging_loss=0.009266, over 3052567.02 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:06:03,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3083686.6666666665, ans=0.125 2023-11-25 22:06:30,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3083886.6666666665, ans=0.125 2023-11-25 22:06:32,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.554e+01 9.201e+01 9.858e+01 1.570e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-25 22:06:45,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-25 22:06:51,775 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5700, loss[loss=0.06244, simple_loss=0.08417, pruned_loss=0.01304, audio_tagging_loss=0.007313, over 16463.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09193, pruned_loss=0.0129, audio_tagging_loss=0.009273, over 3043951.50 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:06:54,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3084020.0, ans=0.125 2023-11-25 22:07:06,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3084086.6666666665, ans=0.125 2023-11-25 22:07:06,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3084086.6666666665, ans=0.0 2023-11-25 22:07:14,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3084153.3333333335, ans=0.125 2023-11-25 22:07:15,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3084153.3333333335, ans=0.125 2023-11-25 22:07:21,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3084153.3333333335, ans=0.125 2023-11-25 22:07:35,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3084286.6666666665, ans=0.1 2023-11-25 22:07:37,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3084286.6666666665, ans=0.1 2023-11-25 22:07:41,330 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-25 22:07:44,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3084286.6666666665, ans=0.07 2023-11-25 22:07:46,930 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5750, loss[loss=0.05796, simple_loss=0.07687, pruned_loss=0.008766, audio_tagging_loss=0.01076, over 16140.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09116, pruned_loss=0.01272, audio_tagging_loss=0.009158, over 3052399.37 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:07:56,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3084420.0, ans=0.125 2023-11-25 22:07:56,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3084420.0, ans=0.0 2023-11-25 22:08:13,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3084486.6666666665, ans=0.2 2023-11-25 22:08:21,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.632e+01 9.114e+01 9.911e+01 1.968e+02, threshold=1.823e+02, percent-clipped=1.0 2023-11-25 22:08:35,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3084620.0, ans=0.125 2023-11-25 22:08:36,342 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-25 22:08:36,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3084620.0, ans=0.125 2023-11-25 22:08:41,438 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5800, loss[loss=0.05655, simple_loss=0.07829, pruned_loss=0.008929, audio_tagging_loss=0.00847, over 14658.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09178, pruned_loss=0.01275, audio_tagging_loss=0.009018, over 3048889.77 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:08:49,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3084686.6666666665, ans=0.125 2023-11-25 22:09:06,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2023-11-25 22:09:30,149 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-25 22:09:31,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3084953.3333333335, ans=0.125 2023-11-25 22:09:32,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3084953.3333333335, ans=0.125 2023-11-25 22:09:35,305 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5850, loss[loss=0.08838, simple_loss=0.1152, pruned_loss=0.02295, audio_tagging_loss=0.007825, over 15752.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09199, pruned_loss=0.01289, audio_tagging_loss=0.008937, over 3050316.05 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:09:56,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3085153.3333333335, ans=0.07 2023-11-25 22:10:00,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3085153.3333333335, ans=0.1 2023-11-25 22:10:12,171 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.550e+01 9.214e+01 9.901e+01 1.645e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-25 22:10:24,282 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-25 22:10:30,187 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5900, loss[loss=0.05597, simple_loss=0.07155, pruned_loss=0.01096, audio_tagging_loss=0.009232, over 13982.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.0913, pruned_loss=0.01281, audio_tagging_loss=0.00887, over 3047364.75 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:10:39,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3085353.3333333335, ans=0.125 2023-11-25 22:10:55,370 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:11:19,900 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-25 22:11:20,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3085620.0, ans=0.125 2023-11-25 22:11:24,930 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5950, loss[loss=0.05836, simple_loss=0.08345, pruned_loss=0.008538, audio_tagging_loss=0.008098, over 16064.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09017, pruned_loss=0.01275, audio_tagging_loss=0.008962, over 3053068.13 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:11:32,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3085686.6666666665, ans=0.125 2023-11-25 22:11:35,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-25 22:11:39,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3085753.3333333335, ans=0.0 2023-11-25 22:12:01,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3085886.6666666665, ans=0.2 2023-11-25 22:12:02,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.667e+01 9.136e+01 9.803e+01 1.331e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-25 22:12:08,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3085953.3333333335, ans=0.1 2023-11-25 22:12:14,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-25 22:12:19,210 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6000, loss[loss=0.06374, simple_loss=0.09057, pruned_loss=0.009359, audio_tagging_loss=0.009098, over 14981.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09077, pruned_loss=0.01274, audio_tagging_loss=0.008917, over 3049526.85 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:12:19,210 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-25 22:12:35,602 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3378, 5.0244, 4.7403, 5.1654], device='cuda:1') 2023-11-25 22:12:50,932 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05816, simple_loss=0.05073, pruned_loss=0.00518, audio_tagging_loss=0.02762, over 4681554.00 frames. 2023-11-25 22:12:50,932 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-25 22:12:59,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2023-11-25 22:13:16,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3086153.3333333335, ans=0.125 2023-11-25 22:13:31,829 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:13:31,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3086220.0, ans=0.07 2023-11-25 22:13:40,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-25 22:13:44,906 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:13:45,810 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6050, loss[loss=0.06441, simple_loss=0.09343, pruned_loss=0.009162, audio_tagging_loss=0.008531, over 15379.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09174, pruned_loss=0.01298, audio_tagging_loss=0.00891, over 3050539.63 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:13:51,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3086353.3333333335, ans=0.125 2023-11-25 22:14:01,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=22.5 2023-11-25 22:14:21,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3086553.3333333335, ans=0.125 2023-11-25 22:14:23,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.707e+01 9.356e+01 1.011e+02 1.518e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 22:14:28,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3086620.0, ans=0.0 2023-11-25 22:14:35,136 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-25 22:14:40,565 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6100, loss[loss=0.06948, simple_loss=0.09648, pruned_loss=0.01132, audio_tagging_loss=0.009929, over 14579.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09155, pruned_loss=0.0129, audio_tagging_loss=0.00899, over 3052592.53 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:14:40,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3086686.6666666665, ans=0.2 2023-11-25 22:14:51,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.65 vs. limit=15.0 2023-11-25 22:15:02,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3086820.0, ans=0.0 2023-11-25 22:15:16,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3086886.6666666665, ans=0.2 2023-11-25 22:15:29,932 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-25 22:15:32,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3086953.3333333335, ans=0.0 2023-11-25 22:15:33,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-11-25 22:15:36,143 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6150, loss[loss=0.07402, simple_loss=0.1005, pruned_loss=0.0151, audio_tagging_loss=0.008684, over 16606.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09185, pruned_loss=0.01296, audio_tagging_loss=0.009045, over 3048486.70 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:15:37,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3087020.0, ans=0.125 2023-11-25 22:15:46,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3087086.6666666665, ans=0.95 2023-11-25 22:15:53,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-25 22:15:55,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3087086.6666666665, ans=0.125 2023-11-25 22:15:59,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3087153.3333333335, ans=0.125 2023-11-25 22:16:08,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087220.0, ans=0.1 2023-11-25 22:16:11,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2023-11-25 22:16:12,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=12.0 2023-11-25 22:16:14,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.733e+01 9.242e+01 9.873e+01 1.239e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:16:26,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-25 22:16:28,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3087286.6666666665, ans=0.125 2023-11-25 22:16:31,579 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6200, loss[loss=0.06151, simple_loss=0.08089, pruned_loss=0.01056, audio_tagging_loss=0.01051, over 15628.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09102, pruned_loss=0.01279, audio_tagging_loss=0.009129, over 3055328.64 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:16:36,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3087353.3333333335, ans=0.125 2023-11-25 22:16:45,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-11-25 22:16:56,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3087486.6666666665, ans=0.1 2023-11-25 22:17:09,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087553.3333333335, ans=0.1 2023-11-25 22:17:13,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3087553.3333333335, ans=0.0 2023-11-25 22:17:19,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2023-11-25 22:17:20,717 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-25 22:17:25,792 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6250, loss[loss=0.06711, simple_loss=0.09168, pruned_loss=0.01044, audio_tagging_loss=0.01083, over 15143.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09069, pruned_loss=0.01269, audio_tagging_loss=0.009133, over 3057365.63 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:17:46,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3087753.3333333335, ans=0.125 2023-11-25 22:17:53,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3087820.0, ans=0.125 2023-11-25 22:17:59,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3087886.6666666665, ans=0.125 2023-11-25 22:18:02,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-25 22:18:04,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087886.6666666665, ans=0.1 2023-11-25 22:18:04,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.687e+01 9.120e+01 9.665e+01 2.497e+02, threshold=1.824e+02, percent-clipped=1.0 2023-11-25 22:18:06,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2023-11-25 22:18:10,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3087953.3333333335, ans=0.0 2023-11-25 22:18:10,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.64 vs. limit=22.5 2023-11-25 22:18:15,274 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-25 22:18:21,355 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6300, loss[loss=0.05784, simple_loss=0.07546, pruned_loss=0.009676, audio_tagging_loss=0.01043, over 15547.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.0906, pruned_loss=0.01261, audio_tagging_loss=0.009192, over 3054067.55 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:18:27,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3088020.0, ans=0.0 2023-11-25 22:18:45,146 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:18:52,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3088153.3333333335, ans=0.09899494936611666 2023-11-25 22:19:11,729 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-25 22:19:16,998 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6350, loss[loss=0.07744, simple_loss=0.1041, pruned_loss=0.01784, audio_tagging_loss=0.007542, over 15170.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09044, pruned_loss=0.01267, audio_tagging_loss=0.009221, over 3046914.16 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:19:25,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3088353.3333333335, ans=0.0 2023-11-25 22:19:29,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3088420.0, ans=0.125 2023-11-25 22:19:29,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3088420.0, ans=0.1 2023-11-25 22:19:49,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3088553.3333333335, ans=0.07 2023-11-25 22:19:56,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.465e+01 9.257e+01 9.775e+01 1.191e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 22:20:00,475 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:20:06,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088620.0, ans=0.1 2023-11-25 22:20:07,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-25 22:20:12,518 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6400, loss[loss=0.06582, simple_loss=0.08914, pruned_loss=0.01167, audio_tagging_loss=0.00958, over 15758.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09064, pruned_loss=0.01262, audio_tagging_loss=0.009258, over 3042179.86 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:20:41,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-11-25 22:20:48,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2023-11-25 22:20:51,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3088886.6666666665, ans=0.125 2023-11-25 22:20:52,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3088886.6666666665, ans=0.125 2023-11-25 22:20:55,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-25 22:21:01,536 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-25 22:21:06,708 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6450, loss[loss=0.04914, simple_loss=0.06037, pruned_loss=0.008014, audio_tagging_loss=0.01094, over 14047.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.08998, pruned_loss=0.01262, audio_tagging_loss=0.009431, over 3040410.20 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:21:10,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3089020.0, ans=0.125 2023-11-25 22:21:20,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3089086.6666666665, ans=0.0 2023-11-25 22:21:30,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3089153.3333333335, ans=0.125 2023-11-25 22:21:45,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.513e+01 9.181e+01 1.004e+02 1.135e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-25 22:21:56,995 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-25 22:22:03,078 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6500, loss[loss=0.05184, simple_loss=0.06504, pruned_loss=0.007977, audio_tagging_loss=0.01135, over 15441.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08891, pruned_loss=0.01236, audio_tagging_loss=0.00939, over 3032805.51 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:22:20,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2023-11-25 22:22:35,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3089553.3333333335, ans=0.125 2023-11-25 22:22:41,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3089553.3333333335, ans=0.025 2023-11-25 22:22:43,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3089553.3333333335, ans=0.125 2023-11-25 22:22:52,365 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-25 22:22:58,095 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6550, loss[loss=0.07713, simple_loss=0.105, pruned_loss=0.01563, audio_tagging_loss=0.009019, over 15759.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08881, pruned_loss=0.01242, audio_tagging_loss=0.009272, over 3031136.97 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:22:58,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3089686.6666666665, ans=0.125 2023-11-25 22:23:06,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3089686.6666666665, ans=0.0 2023-11-25 22:23:36,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.654e+01 9.097e+01 9.635e+01 1.212e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-25 22:23:43,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3089953.3333333335, ans=0.0 2023-11-25 22:23:47,523 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-25 22:23:52,789 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6600, loss[loss=0.06925, simple_loss=0.08458, pruned_loss=0.0153, audio_tagging_loss=0.01166, over 14780.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08908, pruned_loss=0.01248, audio_tagging_loss=0.009139, over 3043749.85 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:23:55,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3090020.0, ans=0.0 2023-11-25 22:24:08,443 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:24:20,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3090153.3333333335, ans=0.1 2023-11-25 22:24:26,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3090220.0, ans=0.2 2023-11-25 22:24:35,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3090220.0, ans=0.125 2023-11-25 22:24:37,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-25 22:24:40,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-25 22:24:43,428 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-25 22:24:49,278 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6650, loss[loss=0.07155, simple_loss=0.1014, pruned_loss=0.01367, audio_tagging_loss=0.0072, over 14796.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08992, pruned_loss=0.01251, audio_tagging_loss=0.009057, over 3044292.14 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:14,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3090486.6666666665, ans=0.0 2023-11-25 22:25:27,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.685e+01 9.321e+01 9.946e+01 1.416e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-25 22:25:38,694 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-25 22:25:44,053 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6700, loss[loss=0.06851, simple_loss=0.09, pruned_loss=0.01147, audio_tagging_loss=0.01203, over 15652.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09042, pruned_loss=0.01253, audio_tagging_loss=0.008945, over 3046512.24 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:46,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3090686.6666666665, ans=0.125 2023-11-25 22:25:58,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3090753.3333333335, ans=0.125 2023-11-25 22:26:03,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3090753.3333333335, ans=0.1 2023-11-25 22:26:17,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-25 22:26:24,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-25 22:26:33,547 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-25 22:26:36,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3090953.3333333335, ans=0.0 2023-11-25 22:26:38,703 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6750, loss[loss=0.04721, simple_loss=0.05811, pruned_loss=0.007917, audio_tagging_loss=0.01023, over 15044.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08982, pruned_loss=0.01249, audio_tagging_loss=0.008939, over 3039800.13 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:27:02,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-25 22:27:05,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-25 22:27:07,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.68 vs. limit=22.5 2023-11-25 22:27:09,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-25 22:27:17,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.613e+01 9.173e+01 9.716e+01 1.152e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-25 22:27:28,300 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-25 22:27:32,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3091286.6666666665, ans=0.0 2023-11-25 22:27:33,918 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6800, loss[loss=0.06465, simple_loss=0.0878, pruned_loss=0.013, audio_tagging_loss=0.007749, over 14947.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08888, pruned_loss=0.0125, audio_tagging_loss=0.008973, over 3036598.45 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:27:52,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3091420.0, ans=15.0 2023-11-25 22:27:58,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3091486.6666666665, ans=0.125 2023-11-25 22:28:08,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3091553.3333333335, ans=0.0 2023-11-25 22:28:19,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-25 22:28:21,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3091620.0, ans=0.2 2023-11-25 22:28:22,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3091620.0, ans=0.1 2023-11-25 22:28:23,732 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-25 22:28:28,854 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6850, loss[loss=0.06011, simple_loss=0.07568, pruned_loss=0.0101, audio_tagging_loss=0.01217, over 13975.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08891, pruned_loss=0.01247, audio_tagging_loss=0.009001, over 3045424.70 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:28:31,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3091686.6666666665, ans=0.125 2023-11-25 22:28:33,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3091686.6666666665, ans=0.125 2023-11-25 22:28:38,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2023-11-25 22:28:40,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3091753.3333333335, ans=0.2 2023-11-25 22:28:47,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3091753.3333333335, ans=0.125 2023-11-25 22:28:48,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3091753.3333333335, ans=0.0 2023-11-25 22:28:51,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3091820.0, ans=0.0 2023-11-25 22:29:07,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.919e+01 8.654e+01 9.393e+01 1.015e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:29:12,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=22.5 2023-11-25 22:29:15,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.30 vs. limit=10.0 2023-11-25 22:29:18,080 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-25 22:29:21,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091953.3333333335, ans=0.1 2023-11-25 22:29:23,605 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6900, loss[loss=0.06725, simple_loss=0.09637, pruned_loss=0.01243, audio_tagging_loss=0.006639, over 16275.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.0895, pruned_loss=0.01254, audio_tagging_loss=0.008971, over 3050098.79 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:29:34,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092086.6666666665, ans=0.1 2023-11-25 22:29:37,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3092086.6666666665, ans=0.125 2023-11-25 22:29:39,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3092086.6666666665, ans=0.04949747468305833 2023-11-25 22:29:40,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3092086.6666666665, ans=0.125 2023-11-25 22:29:57,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092220.0, ans=0.1 2023-11-25 22:30:03,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3092220.0, ans=0.0 2023-11-25 22:30:08,453 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:30:13,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-25 22:30:20,047 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6950, loss[loss=0.06764, simple_loss=0.08629, pruned_loss=0.0134, audio_tagging_loss=0.0111, over 16087.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09037, pruned_loss=0.0127, audio_tagging_loss=0.008995, over 3048419.20 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:30:32,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3092420.0, ans=0.125 2023-11-25 22:30:46,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-25 22:30:49,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3092486.6666666665, ans=0.125 2023-11-25 22:30:58,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.697e+01 9.205e+01 9.794e+01 1.442e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 22:31:07,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=22.5 2023-11-25 22:31:08,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3092620.0, ans=0.125 2023-11-25 22:31:09,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2023-11-25 22:31:09,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-25 22:31:12,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3092620.0, ans=0.125 2023-11-25 22:31:15,077 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7000, loss[loss=0.07307, simple_loss=0.104, pruned_loss=0.01377, audio_tagging_loss=0.007322, over 15818.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09025, pruned_loss=0.01265, audio_tagging_loss=0.009006, over 3053446.95 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:31:15,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-11-25 22:31:27,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-25 22:31:29,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3092753.3333333335, ans=0.0 2023-11-25 22:31:52,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-11-25 22:32:04,327 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-25 22:32:05,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-25 22:32:09,447 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7050, loss[loss=0.07002, simple_loss=0.08972, pruned_loss=0.01446, audio_tagging_loss=0.0107, over 15357.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09001, pruned_loss=0.01262, audio_tagging_loss=0.009111, over 3050829.22 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:32:12,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3093020.0, ans=10.0 2023-11-25 22:32:17,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3093020.0, ans=0.0 2023-11-25 22:32:24,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093086.6666666665, ans=0.1 2023-11-25 22:32:26,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-25 22:32:27,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-25 22:32:39,899 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:32:48,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.460e+01 9.019e+01 9.979e+01 1.338e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-25 22:32:57,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3093286.6666666665, ans=0.1 2023-11-25 22:32:58,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-25 22:33:07,451 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7100, loss[loss=0.07354, simple_loss=0.09614, pruned_loss=0.0139, audio_tagging_loss=0.01157, over 15319.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09012, pruned_loss=0.01271, audio_tagging_loss=0.009166, over 3052837.59 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:33:36,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3093486.6666666665, ans=0.125 2023-11-25 22:33:37,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3093486.6666666665, ans=0.2 2023-11-25 22:33:53,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.27 vs. limit=22.5 2023-11-25 22:33:57,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-25 22:34:02,477 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7150, loss[loss=0.07038, simple_loss=0.09833, pruned_loss=0.01246, audio_tagging_loss=0.008751, over 14888.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09075, pruned_loss=0.01269, audio_tagging_loss=0.00921, over 3051909.76 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:34:19,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3093753.3333333335, ans=0.1 2023-11-25 22:34:23,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093820.0, ans=0.1 2023-11-25 22:34:24,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3093820.0, ans=0.125 2023-11-25 22:34:33,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-25 22:34:40,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.669e+01 9.271e+01 1.002e+02 1.351e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 22:34:50,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3093953.3333333335, ans=0.95 2023-11-25 22:34:51,309 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-25 22:34:54,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-11-25 22:34:56,590 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7200, loss[loss=0.06612, simple_loss=0.0948, pruned_loss=0.01156, audio_tagging_loss=0.007156, over 14298.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09008, pruned_loss=0.01266, audio_tagging_loss=0.009352, over 3048202.43 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:35:05,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3094020.0, ans=0.1 2023-11-25 22:35:16,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2023-11-25 22:35:18,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3094153.3333333335, ans=0.1 2023-11-25 22:35:27,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3094153.3333333335, ans=0.125 2023-11-25 22:35:45,899 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-25 22:35:51,593 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7250, loss[loss=0.06052, simple_loss=0.08606, pruned_loss=0.008448, audio_tagging_loss=0.009045, over 15328.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09074, pruned_loss=0.01274, audio_tagging_loss=0.009353, over 3044320.60 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:36:03,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3094420.0, ans=0.1 2023-11-25 22:36:25,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094553.3333333335, ans=0.1 2023-11-25 22:36:30,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3094553.3333333335, ans=0.125 2023-11-25 22:36:31,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.827e+01 9.307e+01 1.005e+02 1.461e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 22:36:35,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3094620.0, ans=0.125 2023-11-25 22:36:42,609 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-25 22:36:48,013 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7300, loss[loss=0.06502, simple_loss=0.08983, pruned_loss=0.0137, audio_tagging_loss=0.00641, over 14709.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09085, pruned_loss=0.0127, audio_tagging_loss=0.009231, over 3039866.55 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:36:54,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-25 22:36:59,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3094753.3333333335, ans=0.0 2023-11-25 22:37:12,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3094820.0, ans=0.0 2023-11-25 22:37:16,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3094820.0, ans=0.0 2023-11-25 22:37:21,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3094886.6666666665, ans=0.04949747468305833 2023-11-25 22:37:37,125 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-25 22:37:41,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=22.5 2023-11-25 22:37:42,319 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7350, loss[loss=0.05529, simple_loss=0.07799, pruned_loss=0.01088, audio_tagging_loss=0.005412, over 15231.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.0902, pruned_loss=0.01273, audio_tagging_loss=0.009204, over 3036834.27 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:37:50,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3095020.0, ans=0.125 2023-11-25 22:37:53,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3095086.6666666665, ans=0.05 2023-11-25 22:38:00,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3095086.6666666665, ans=0.0 2023-11-25 22:38:17,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3095220.0, ans=0.2 2023-11-25 22:38:22,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3095220.0, ans=0.125 2023-11-25 22:38:23,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.376e+01 8.699e+01 9.551e+01 1.020e+02 2.458e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-25 22:38:31,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-25 22:38:36,926 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7400, loss[loss=0.06119, simple_loss=0.08668, pruned_loss=0.008693, audio_tagging_loss=0.009157, over 15807.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09049, pruned_loss=0.01261, audio_tagging_loss=0.009077, over 3045872.71 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:38:55,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3095420.0, ans=0.125 2023-11-25 22:39:05,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3095486.6666666665, ans=0.0 2023-11-25 22:39:15,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3095553.3333333335, ans=0.125 2023-11-25 22:39:20,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3095620.0, ans=0.125 2023-11-25 22:39:26,761 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-25 22:39:32,920 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7450, loss[loss=0.04603, simple_loss=0.05276, pruned_loss=0.004291, audio_tagging_loss=0.01536, over 14946.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08988, pruned_loss=0.01254, audio_tagging_loss=0.009035, over 3035420.60 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:39:47,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3095753.3333333335, ans=0.95 2023-11-25 22:39:51,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3095753.3333333335, ans=0.0 2023-11-25 22:40:13,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.803e+01 9.393e+01 1.013e+02 1.307e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:40:18,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3095953.3333333335, ans=0.125 2023-11-25 22:40:21,924 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-25 22:40:23,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3095953.3333333335, ans=0.0 2023-11-25 22:40:27,443 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7500, loss[loss=0.08052, simple_loss=0.1201, pruned_loss=0.01286, audio_tagging_loss=0.007596, over 13828.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09043, pruned_loss=0.01256, audio_tagging_loss=0.008939, over 3034191.48 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:40:32,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096020.0, ans=0.1 2023-11-25 22:40:40,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3096086.6666666665, ans=0.125 2023-11-25 22:41:01,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096220.0, ans=0.1 2023-11-25 22:41:16,958 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-25 22:41:22,276 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7550, loss[loss=0.05931, simple_loss=0.08672, pruned_loss=0.01002, audio_tagging_loss=0.005932, over 14926.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09083, pruned_loss=0.01272, audio_tagging_loss=0.008917, over 3028633.97 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:41:24,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3096353.3333333335, ans=0.1 2023-11-25 22:41:30,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3096353.3333333335, ans=0.0 2023-11-25 22:41:53,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-25 22:42:02,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.821e+01 8.728e+01 9.410e+01 1.018e+02 1.180e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 22:42:06,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3096620.0, ans=15.0 2023-11-25 22:42:11,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096620.0, ans=0.1 2023-11-25 22:42:12,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-25 22:42:18,120 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7600, loss[loss=0.05646, simple_loss=0.07166, pruned_loss=0.01094, audio_tagging_loss=0.009688, over 16002.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08937, pruned_loss=0.01256, audio_tagging_loss=0.008961, over 3032423.05 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:42:31,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3096753.3333333335, ans=0.025 2023-11-25 22:42:46,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=15.0 2023-11-25 22:42:52,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3096886.6666666665, ans=0.125 2023-11-25 22:42:52,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-25 22:42:55,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-25 22:43:07,850 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-25 22:43:13,160 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7650, loss[loss=0.05769, simple_loss=0.07797, pruned_loss=0.007769, audio_tagging_loss=0.01094, over 15125.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.0885, pruned_loss=0.01247, audio_tagging_loss=0.008925, over 3031412.17 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:43:16,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3097020.0, ans=0.125 2023-11-25 22:43:20,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3097020.0, ans=0.2 2023-11-25 22:43:27,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-25 22:43:55,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.616e+01 9.118e+01 9.857e+01 1.270e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 22:43:58,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3097286.6666666665, ans=0.0 2023-11-25 22:44:02,523 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-25 22:44:06,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.24 vs. limit=10.0 2023-11-25 22:44:08,191 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7700, loss[loss=0.07509, simple_loss=0.1094, pruned_loss=0.01413, audio_tagging_loss=0.006276, over 15169.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08851, pruned_loss=0.01237, audio_tagging_loss=0.008911, over 3036236.80 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:44:08,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3097353.3333333335, ans=0.0 2023-11-25 22:44:18,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2023-11-25 22:44:28,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3097420.0, ans=0.0 2023-11-25 22:44:28,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3097420.0, ans=0.125 2023-11-25 22:44:30,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3097486.6666666665, ans=0.125 2023-11-25 22:44:48,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3097553.3333333335, ans=0.0 2023-11-25 22:44:58,681 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-25 22:45:04,314 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7750, loss[loss=0.05156, simple_loss=0.06348, pruned_loss=0.009613, audio_tagging_loss=0.0102, over 16101.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.0877, pruned_loss=0.0124, audio_tagging_loss=0.00907, over 3034646.54 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:45:29,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3097820.0, ans=0.0 2023-11-25 22:45:32,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3097820.0, ans=15.0 2023-11-25 22:45:34,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3097820.0, ans=0.0 2023-11-25 22:45:35,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3097886.6666666665, ans=0.125 2023-11-25 22:45:38,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3097886.6666666665, ans=0.125 2023-11-25 22:45:46,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.755e+01 9.240e+01 9.987e+01 1.306e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:45:53,710 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-25 22:45:55,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3097953.3333333335, ans=0.125 2023-11-25 22:45:59,389 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7800, loss[loss=0.0827, simple_loss=0.108, pruned_loss=0.01846, audio_tagging_loss=0.01023, over 14846.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08911, pruned_loss=0.01263, audio_tagging_loss=0.008956, over 3039368.43 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:46:11,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3098086.6666666665, ans=0.125 2023-11-25 22:46:18,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3098086.6666666665, ans=0.0 2023-11-25 22:46:39,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3098220.0, ans=0.025 2023-11-25 22:46:43,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-25 22:46:48,763 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-25 22:46:54,006 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7850, loss[loss=0.06568, simple_loss=0.08868, pruned_loss=0.01012, audio_tagging_loss=0.01122, over 15918.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08933, pruned_loss=0.01259, audio_tagging_loss=0.009017, over 3039995.23 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:47:00,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3098353.3333333335, ans=0.125 2023-11-25 22:47:04,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3098420.0, ans=0.2 2023-11-25 22:47:08,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3098420.0, ans=0.125 2023-11-25 22:47:35,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.791e+01 9.341e+01 1.014e+02 1.334e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-25 22:47:43,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-25 22:47:49,382 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7900, loss[loss=0.06668, simple_loss=0.09634, pruned_loss=0.01035, audio_tagging_loss=0.008161, over 15311.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08962, pruned_loss=0.01258, audio_tagging_loss=0.009148, over 3050952.14 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:48:12,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3098820.0, ans=0.125 2023-11-25 22:48:13,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3098820.0, ans=0.0 2023-11-25 22:48:15,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.59 vs. limit=10.0 2023-11-25 22:48:22,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=22.5 2023-11-25 22:48:38,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3098953.3333333335, ans=0.025 2023-11-25 22:48:39,179 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-25 22:48:44,373 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7950, loss[loss=0.05779, simple_loss=0.07512, pruned_loss=0.0102, audio_tagging_loss=0.01003, over 14494.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.0896, pruned_loss=0.0126, audio_tagging_loss=0.009254, over 3052167.75 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:48:46,698 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:48:58,466 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:49:11,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3099153.3333333335, ans=0.1 2023-11-25 22:49:12,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3099153.3333333335, ans=0.0 2023-11-25 22:49:16,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3099153.3333333335, ans=0.125 2023-11-25 22:49:20,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3099220.0, ans=0.2 2023-11-25 22:49:22,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3099220.0, ans=0.0 2023-11-25 22:49:22,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3099220.0, ans=0.125 2023-11-25 22:49:26,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.696e+01 9.334e+01 1.006e+02 1.500e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-25 22:49:26,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3099220.0, ans=0.125 2023-11-25 22:49:31,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2023-11-25 22:49:34,211 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-25 22:49:37,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-25 22:49:39,376 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8000, loss[loss=0.05872, simple_loss=0.08109, pruned_loss=0.009389, audio_tagging_loss=0.008784, over 15015.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09007, pruned_loss=0.01259, audio_tagging_loss=0.009288, over 3046749.54 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:49:45,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-25 22:49:47,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3099353.3333333335, ans=0.125 2023-11-25 22:50:28,795 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-25 22:50:28,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3099620.0, ans=0.0 2023-11-25 22:50:34,977 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8050, loss[loss=0.08849, simple_loss=0.1194, pruned_loss=0.01999, audio_tagging_loss=0.008802, over 14959.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.08976, pruned_loss=0.0125, audio_tagging_loss=0.009319, over 3035444.79 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:50:42,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=22.5 2023-11-25 22:51:14,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3099886.6666666665, ans=0.0 2023-11-25 22:51:16,987 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.622e+01 9.226e+01 9.839e+01 1.205e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-25 22:51:24,907 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-25 22:51:26,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3099953.3333333335, ans=0.09899494936611666 2023-11-25 22:51:30,380 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8100, loss[loss=0.06045, simple_loss=0.08722, pruned_loss=0.0103, audio_tagging_loss=0.006547, over 14895.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08949, pruned_loss=0.01251, audio_tagging_loss=0.00925, over 3032086.78 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:52:16,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2023-11-25 22:52:19,608 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-25 22:52:24,795 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8150, loss[loss=0.07848, simple_loss=0.1083, pruned_loss=0.01348, audio_tagging_loss=0.01085, over 15171.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08999, pruned_loss=0.01259, audio_tagging_loss=0.009135, over 3036996.95 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:52:27,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3100353.3333333335, ans=0.125 2023-11-25 22:52:32,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3100353.3333333335, ans=0.0 2023-11-25 22:52:42,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3100420.0, ans=0.125 2023-11-25 22:52:50,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-11-25 22:52:53,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3100486.6666666665, ans=0.0 2023-11-25 22:53:00,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-25 22:53:01,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3100553.3333333335, ans=0.125 2023-11-25 22:53:06,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.506e+01 9.069e+01 1.015e+02 1.632e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 22:53:09,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3100620.0, ans=0.1 2023-11-25 22:53:12,660 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:53:14,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-25 22:53:20,590 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8200, loss[loss=0.07476, simple_loss=0.0954, pruned_loss=0.01756, audio_tagging_loss=0.009495, over 15172.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09113, pruned_loss=0.01267, audio_tagging_loss=0.009003, over 3043186.32 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:53:23,157 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:53:29,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3100686.6666666665, ans=0.0 2023-11-25 22:53:29,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3100686.6666666665, ans=0.0 2023-11-25 22:53:46,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3100820.0, ans=0.125 2023-11-25 22:53:51,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=22.5 2023-11-25 22:54:04,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3100953.3333333335, ans=0.1 2023-11-25 22:54:11,016 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-25 22:54:16,190 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8250, loss[loss=0.05521, simple_loss=0.0846, pruned_loss=0.005162, audio_tagging_loss=0.007752, over 15478.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09032, pruned_loss=0.01255, audio_tagging_loss=0.008977, over 3039129.67 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:54:33,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3101086.6666666665, ans=0.125 2023-11-25 22:54:58,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.606e+01 9.259e+01 1.021e+02 1.240e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 22:55:04,730 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-25 22:55:08,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3101286.6666666665, ans=0.125 2023-11-25 22:55:10,188 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8300, loss[loss=0.07592, simple_loss=0.1011, pruned_loss=0.01837, audio_tagging_loss=0.006982, over 15025.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.0894, pruned_loss=0.01245, audio_tagging_loss=0.009009, over 3039463.98 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:55:11,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=22.5 2023-11-25 22:55:15,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3101353.3333333335, ans=0.2 2023-11-25 22:55:42,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3101553.3333333335, ans=0.125 2023-11-25 22:55:57,026 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:55:58,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-25 22:55:59,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3101620.0, ans=0.09899494936611666 2023-11-25 22:56:04,596 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8350, loss[loss=0.09162, simple_loss=0.1287, pruned_loss=0.01908, audio_tagging_loss=0.008209, over 15277.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09014, pruned_loss=0.0125, audio_tagging_loss=0.008952, over 3045649.04 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:56:07,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3101686.6666666665, ans=0.0 2023-11-25 22:56:09,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3101686.6666666665, ans=0.0 2023-11-25 22:56:19,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3101753.3333333335, ans=0.125 2023-11-25 22:56:20,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3101753.3333333335, ans=0.0 2023-11-25 22:56:46,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.512e+01 9.293e+01 1.012e+02 1.242e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:56:54,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-25 22:56:59,884 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8400, loss[loss=0.06805, simple_loss=0.0937, pruned_loss=0.01576, audio_tagging_loss=0.005447, over 14842.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09071, pruned_loss=0.01257, audio_tagging_loss=0.008876, over 3054279.77 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:57:05,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3102020.0, ans=0.2 2023-11-25 22:57:16,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3102086.6666666665, ans=0.0 2023-11-25 22:57:18,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3102086.6666666665, ans=0.125 2023-11-25 22:57:19,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102153.3333333335, ans=0.1 2023-11-25 22:57:24,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3102153.3333333335, ans=0.0 2023-11-25 22:57:48,444 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-25 22:57:53,640 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8450, loss[loss=0.0637, simple_loss=0.07885, pruned_loss=0.0135, audio_tagging_loss=0.01077, over 15979.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09036, pruned_loss=0.01243, audio_tagging_loss=0.008879, over 3049665.01 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:58:03,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3102420.0, ans=0.125 2023-11-25 22:58:15,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3102486.6666666665, ans=0.125 2023-11-25 22:58:35,877 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.915e+01 9.393e+01 9.975e+01 1.301e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:58:36,129 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:58:36,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3102620.0, ans=0.1 2023-11-25 22:58:42,275 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-25 22:58:47,827 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8500, loss[loss=0.06039, simple_loss=0.08012, pruned_loss=0.01134, audio_tagging_loss=0.009, over 14946.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09027, pruned_loss=0.01246, audio_tagging_loss=0.008886, over 3053768.39 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:58:48,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3102686.6666666665, ans=0.0 2023-11-25 22:59:05,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-11-25 22:59:10,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3102820.0, ans=0.1 2023-11-25 22:59:11,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3102820.0, ans=0.125 2023-11-25 22:59:19,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.95 vs. limit=22.5 2023-11-25 22:59:37,816 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-25 22:59:43,590 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8550, loss[loss=0.06482, simple_loss=0.08884, pruned_loss=0.01183, audio_tagging_loss=0.008571, over 16169.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09043, pruned_loss=0.01246, audio_tagging_loss=0.008898, over 3049551.88 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:59:52,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3103020.0, ans=0.125 2023-11-25 23:00:07,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3103153.3333333335, ans=0.125 2023-11-25 23:00:18,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3103220.0, ans=0.125 2023-11-25 23:00:25,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.599e+01 9.050e+01 9.776e+01 1.276e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-25 23:00:27,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3103286.6666666665, ans=0.1 2023-11-25 23:00:32,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-25 23:00:37,341 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8600, loss[loss=0.05753, simple_loss=0.07392, pruned_loss=0.0106, audio_tagging_loss=0.009967, over 15197.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09099, pruned_loss=0.01269, audio_tagging_loss=0.00896, over 3048366.25 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:01:00,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3103486.6666666665, ans=0.0 2023-11-25 23:01:26,003 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-25 23:01:29,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3103620.0, ans=0.0 2023-11-25 23:01:31,131 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8650, loss[loss=0.09376, simple_loss=0.1231, pruned_loss=0.02357, audio_tagging_loss=0.008654, over 15075.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09128, pruned_loss=0.01277, audio_tagging_loss=0.008975, over 3050348.70 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:01:34,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3103686.6666666665, ans=0.0 2023-11-25 23:02:04,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3103886.6666666665, ans=0.2 2023-11-25 23:02:13,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.616e+01 9.272e+01 9.852e+01 1.304e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 23:02:13,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3103953.3333333335, ans=0.0 2023-11-25 23:02:19,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3103953.3333333335, ans=0.2 2023-11-25 23:02:20,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-25 23:02:25,783 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8700, loss[loss=0.07633, simple_loss=0.1042, pruned_loss=0.01519, audio_tagging_loss=0.009023, over 14844.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09056, pruned_loss=0.01271, audio_tagging_loss=0.009178, over 3053551.49 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:02:35,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3104020.0, ans=0.1 2023-11-25 23:02:52,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3104153.3333333335, ans=0.125 2023-11-25 23:02:54,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3104153.3333333335, ans=0.125 2023-11-25 23:03:15,447 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-25 23:03:18,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3104286.6666666665, ans=0.125 2023-11-25 23:03:20,574 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8750, loss[loss=0.07198, simple_loss=0.1029, pruned_loss=0.01351, audio_tagging_loss=0.007048, over 14881.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09175, pruned_loss=0.01304, audio_tagging_loss=0.009112, over 3051200.30 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:03:23,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-25 23:03:29,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3104353.3333333335, ans=0.0 2023-11-25 23:03:32,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3104420.0, ans=0.0 2023-11-25 23:03:32,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3104420.0, ans=0.125 2023-11-25 23:03:34,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3104420.0, ans=0.125 2023-11-25 23:03:34,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-25 23:03:39,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3104420.0, ans=0.125 2023-11-25 23:04:03,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.696e+01 9.362e+01 9.858e+01 1.375e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 23:04:07,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3104620.0, ans=0.1 2023-11-25 23:04:09,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-25 23:04:11,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3104620.0, ans=0.125 2023-11-25 23:04:14,840 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8800, loss[loss=0.0864, simple_loss=0.1239, pruned_loss=0.01871, audio_tagging_loss=0.005753, over 15731.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09127, pruned_loss=0.01294, audio_tagging_loss=0.009202, over 3048522.43 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:04:28,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3104753.3333333335, ans=0.0 2023-11-25 23:04:48,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3104886.6666666665, ans=0.125 2023-11-25 23:04:51,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-25 23:04:56,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2023-11-25 23:05:04,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-25 23:05:10,609 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8850, loss[loss=0.07264, simple_loss=0.104, pruned_loss=0.01159, audio_tagging_loss=0.009038, over 15275.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09166, pruned_loss=0.01292, audio_tagging_loss=0.009233, over 3043503.32 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:05:15,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3105020.0, ans=0.125 2023-11-25 23:05:23,099 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:05:53,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.482e+01 9.169e+01 1.001e+02 1.243e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 23:05:54,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3105286.6666666665, ans=0.0 2023-11-25 23:06:00,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-25 23:06:01,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3105286.6666666665, ans=0.0 2023-11-25 23:06:04,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3105353.3333333335, ans=0.2 2023-11-25 23:06:06,439 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8900, loss[loss=0.07916, simple_loss=0.1124, pruned_loss=0.01565, audio_tagging_loss=0.007302, over 16605.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09211, pruned_loss=0.01279, audio_tagging_loss=0.009139, over 3048855.56 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:06:10,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3105353.3333333335, ans=0.1 2023-11-25 23:06:23,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2023-11-25 23:06:27,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3105486.6666666665, ans=0.2 2023-11-25 23:06:27,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3105486.6666666665, ans=0.125 2023-11-25 23:06:40,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3105553.3333333335, ans=10.0 2023-11-25 23:06:55,569 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-25 23:07:00,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-25 23:07:00,862 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8950, loss[loss=0.06727, simple_loss=0.09579, pruned_loss=0.008338, audio_tagging_loss=0.01104, over 15763.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09185, pruned_loss=0.01281, audio_tagging_loss=0.008987, over 3049834.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:01,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-25 23:07:11,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3105753.3333333335, ans=0.1 2023-11-25 23:07:14,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3105753.3333333335, ans=0.1 2023-11-25 23:07:18,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2023-11-25 23:07:33,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3105886.6666666665, ans=0.0 2023-11-25 23:07:43,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.637e+01 9.614e+01 1.032e+02 1.612e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-25 23:07:48,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2023-11-25 23:07:50,281 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-25 23:07:56,440 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9000, loss[loss=0.07877, simple_loss=0.1153, pruned_loss=0.01215, audio_tagging_loss=0.008983, over 15599.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09185, pruned_loss=0.01281, audio_tagging_loss=0.00893, over 3043703.98 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:56,440 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-25 23:08:20,347 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9813, 3.1986, 2.9762, 3.2131, 3.4101, 2.8168, 3.4855, 2.8006], device='cuda:1') 2023-11-25 23:08:28,206 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05899, simple_loss=0.0507, pruned_loss=0.005227, audio_tagging_loss=0.02841, over 4681554.00 frames. 2023-11-25 23:08:28,207 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-25 23:08:39,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3106086.6666666665, ans=0.125 2023-11-25 23:09:01,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3106220.0, ans=0.0 2023-11-25 23:09:17,797 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-25 23:09:18,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3106286.6666666665, ans=0.0 2023-11-25 23:09:21,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3106286.6666666665, ans=0.125 2023-11-25 23:09:22,964 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9050, loss[loss=0.05984, simple_loss=0.07463, pruned_loss=0.01217, audio_tagging_loss=0.01036, over 14918.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09097, pruned_loss=0.01267, audio_tagging_loss=0.008938, over 3048807.07 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:09:28,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-11-25 23:09:31,173 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:09:48,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3106486.6666666665, ans=0.125 2023-11-25 23:09:58,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3106553.3333333335, ans=0.125 2023-11-25 23:10:04,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3106553.3333333335, ans=0.125 2023-11-25 23:10:06,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3106620.0, ans=0.125 2023-11-25 23:10:07,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.869e+01 9.445e+01 1.003e+02 1.420e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 23:10:12,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-25 23:10:18,647 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9100, loss[loss=0.05907, simple_loss=0.0823, pruned_loss=0.01057, audio_tagging_loss=0.007349, over 15805.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09104, pruned_loss=0.01287, audio_tagging_loss=0.008852, over 3055291.98 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:10:31,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3106753.3333333335, ans=0.0 2023-11-25 23:10:45,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3106820.0, ans=0.0 2023-11-25 23:11:08,267 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-25 23:11:13,470 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9150, loss[loss=0.06387, simple_loss=0.08565, pruned_loss=0.009907, audio_tagging_loss=0.01114, over 15262.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09095, pruned_loss=0.01283, audio_tagging_loss=0.008824, over 3051894.85 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:11:18,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=12.0 2023-11-25 23:11:19,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3107020.0, ans=0.125 2023-11-25 23:11:27,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3107086.6666666665, ans=0.0 2023-11-25 23:11:31,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-25 23:11:48,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3107220.0, ans=0.2 2023-11-25 23:11:56,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.490e+01 9.148e+01 9.794e+01 1.489e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-25 23:12:00,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2023-11-25 23:12:02,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-25 23:12:06,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3107286.6666666665, ans=0.125 2023-11-25 23:12:07,890 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9200, loss[loss=0.068, simple_loss=0.09381, pruned_loss=0.01115, audio_tagging_loss=0.00995, over 16061.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09097, pruned_loss=0.01283, audio_tagging_loss=0.008786, over 3058468.16 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:12:24,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2023-11-25 23:12:24,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3107420.0, ans=0.125 2023-11-25 23:12:32,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3107486.6666666665, ans=0.125 2023-11-25 23:12:32,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3107486.6666666665, ans=0.125 2023-11-25 23:12:35,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3107486.6666666665, ans=0.125 2023-11-25 23:12:39,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-25 23:12:42,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2023-11-25 23:12:53,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3107620.0, ans=0.125 2023-11-25 23:12:54,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3107620.0, ans=0.1 2023-11-25 23:12:57,044 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-25 23:12:59,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-11-25 23:13:03,216 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9250, loss[loss=0.07307, simple_loss=0.1087, pruned_loss=0.01176, audio_tagging_loss=0.00695, over 15822.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09096, pruned_loss=0.01281, audio_tagging_loss=0.00875, over 3060774.70 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:13:13,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3107753.3333333335, ans=0.95 2023-11-25 23:13:15,513 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:13:34,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3107886.6666666665, ans=10.0 2023-11-25 23:13:34,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3107886.6666666665, ans=0.1 2023-11-25 23:13:46,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.554e+01 9.246e+01 1.012e+02 1.216e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:13:52,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-25 23:13:58,062 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9300, loss[loss=0.08235, simple_loss=0.1164, pruned_loss=0.01661, audio_tagging_loss=0.00752, over 14925.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09112, pruned_loss=0.01282, audio_tagging_loss=0.00874, over 3061539.22 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:14:02,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3108020.0, ans=0.125 2023-11-25 23:14:13,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3108086.6666666665, ans=0.2 2023-11-25 23:14:13,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3108086.6666666665, ans=0.125 2023-11-25 23:14:14,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3108086.6666666665, ans=0.09899494936611666 2023-11-25 23:14:14,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-25 23:14:23,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2023-11-25 23:14:27,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3108153.3333333335, ans=0.0 2023-11-25 23:14:46,882 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-25 23:14:47,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3108286.6666666665, ans=0.0 2023-11-25 23:14:52,132 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9350, loss[loss=0.06132, simple_loss=0.08311, pruned_loss=0.01006, audio_tagging_loss=0.009703, over 14712.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09107, pruned_loss=0.01278, audio_tagging_loss=0.008799, over 3063314.83 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:12,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3108420.0, ans=0.125 2023-11-25 23:15:15,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-25 23:15:18,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2023-11-25 23:15:34,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3108620.0, ans=0.025 2023-11-25 23:15:36,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.512e+01 9.083e+01 9.779e+01 1.171e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-25 23:15:41,154 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-25 23:15:42,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3108620.0, ans=0.125 2023-11-25 23:15:46,823 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9400, loss[loss=0.05983, simple_loss=0.08178, pruned_loss=0.01122, audio_tagging_loss=0.007723, over 15992.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09089, pruned_loss=0.01278, audio_tagging_loss=0.00893, over 3055968.22 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:52,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3108686.6666666665, ans=0.125 2023-11-25 23:15:56,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3108753.3333333335, ans=0.09899494936611666 2023-11-25 23:15:58,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3108753.3333333335, ans=0.5 2023-11-25 23:16:04,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-25 23:16:35,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-25 23:16:41,286 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9450, loss[loss=0.07166, simple_loss=0.09429, pruned_loss=0.01449, audio_tagging_loss=0.01002, over 14666.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.0912, pruned_loss=0.01262, audio_tagging_loss=0.0091, over 3052082.30 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:16:42,350 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:16:45,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3109020.0, ans=0.0 2023-11-25 23:16:47,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3109020.0, ans=0.1 2023-11-25 23:16:48,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-25 23:16:59,390 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:17:05,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3109153.3333333335, ans=0.1 2023-11-25 23:17:07,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3109153.3333333335, ans=0.125 2023-11-25 23:17:25,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3109286.6666666665, ans=0.125 2023-11-25 23:17:25,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.505e+01 9.184e+01 9.882e+01 1.417e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-25 23:17:30,179 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-25 23:17:31,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3109286.6666666665, ans=0.125 2023-11-25 23:17:35,701 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9500, loss[loss=0.06962, simple_loss=0.08821, pruned_loss=0.01677, audio_tagging_loss=0.008748, over 15424.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09098, pruned_loss=0.01276, audio_tagging_loss=0.009189, over 3042500.19 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:17:55,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3109420.0, ans=0.125 2023-11-25 23:17:58,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3109486.6666666665, ans=0.2 2023-11-25 23:18:09,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3109553.3333333335, ans=0.0 2023-11-25 23:18:13,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2023-11-25 23:18:24,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-25 23:18:30,752 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9550, loss[loss=0.08109, simple_loss=0.1046, pruned_loss=0.01624, audio_tagging_loss=0.01254, over 15513.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09109, pruned_loss=0.01276, audio_tagging_loss=0.009246, over 3038711.59 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:18:42,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-25 23:18:42,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-25 23:18:53,470 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:19:16,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.693e+01 9.287e+01 1.001e+02 1.223e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:19:20,376 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-25 23:19:25,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3110020.0, ans=0.0 2023-11-25 23:19:26,138 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9600, loss[loss=0.08246, simple_loss=0.1129, pruned_loss=0.01843, audio_tagging_loss=0.007583, over 15650.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09075, pruned_loss=0.01286, audio_tagging_loss=0.009403, over 3045342.07 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:19:35,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2023-11-25 23:19:56,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3110153.3333333335, ans=0.05 2023-11-25 23:19:58,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3110220.0, ans=0.125 2023-11-25 23:20:14,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-25 23:20:17,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3110286.6666666665, ans=0.125 2023-11-25 23:20:20,027 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9650, loss[loss=0.061, simple_loss=0.09351, pruned_loss=0.008489, audio_tagging_loss=0.005755, over 15716.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.0895, pruned_loss=0.01254, audio_tagging_loss=0.009404, over 3040966.93 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:20:25,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3110353.3333333335, ans=0.2 2023-11-25 23:20:26,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-25 23:20:41,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3110486.6666666665, ans=0.1 2023-11-25 23:20:49,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2023-11-25 23:21:03,175 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:21:05,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.886e+01 9.411e+01 1.006e+02 1.308e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 23:21:09,283 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-25 23:21:10,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3110620.0, ans=0.125 2023-11-25 23:21:10,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3110620.0, ans=0.125 2023-11-25 23:21:14,688 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9700, loss[loss=0.07186, simple_loss=0.09749, pruned_loss=0.01406, audio_tagging_loss=0.009049, over 15122.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08966, pruned_loss=0.01256, audio_tagging_loss=0.009276, over 3048129.00 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:21:43,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3110820.0, ans=0.0 2023-11-25 23:21:59,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3110953.3333333335, ans=0.2 2023-11-25 23:22:00,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-25 23:22:04,881 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-25 23:22:11,105 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9750, loss[loss=0.07012, simple_loss=0.0943, pruned_loss=0.01384, audio_tagging_loss=0.00913, over 15329.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08985, pruned_loss=0.01245, audio_tagging_loss=0.00913, over 3048884.50 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:22:12,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3111020.0, ans=0.125 2023-11-25 23:22:21,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3111086.6666666665, ans=0.2 2023-11-25 23:22:29,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-25 23:22:32,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3111153.3333333335, ans=0.0 2023-11-25 23:22:32,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3111153.3333333335, ans=0.125 2023-11-25 23:22:46,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=15.0 2023-11-25 23:22:47,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3111220.0, ans=0.125 2023-11-25 23:22:55,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2023-11-25 23:22:56,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3111286.6666666665, ans=10.0 2023-11-25 23:22:57,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.598e+01 9.280e+01 1.031e+02 1.262e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:23:00,452 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-25 23:23:03,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3111286.6666666665, ans=0.1 2023-11-25 23:23:05,700 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9800, loss[loss=0.07657, simple_loss=0.1166, pruned_loss=0.01198, audio_tagging_loss=0.006283, over 15901.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09068, pruned_loss=0.01265, audio_tagging_loss=0.009007, over 3042582.68 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:23:11,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3111353.3333333335, ans=0.2 2023-11-25 23:23:22,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3111420.0, ans=0.0 2023-11-25 23:23:33,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-25 23:23:37,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3111486.6666666665, ans=0.0 2023-11-25 23:23:55,145 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-25 23:23:56,127 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:24:00,427 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9850, loss[loss=0.09288, simple_loss=0.1284, pruned_loss=0.02278, audio_tagging_loss=0.005884, over 14491.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09211, pruned_loss=0.01307, audio_tagging_loss=0.008834, over 3044518.46 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:07,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3111686.6666666665, ans=0.1 2023-11-25 23:24:34,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3111886.6666666665, ans=0.0 2023-11-25 23:24:38,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3111886.6666666665, ans=0.2 2023-11-25 23:24:45,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.652e+01 9.205e+01 1.019e+02 1.596e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 23:24:50,041 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-25 23:24:51,258 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:24:54,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3112020.0, ans=0.125 2023-11-25 23:24:55,436 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9900, loss[loss=0.04796, simple_loss=0.05935, pruned_loss=0.00674, audio_tagging_loss=0.01154, over 14657.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09132, pruned_loss=0.01283, audio_tagging_loss=0.008859, over 3047138.34 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:56,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3112020.0, ans=0.125 2023-11-25 23:25:05,071 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:25:24,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3112153.3333333335, ans=0.125 2023-11-25 23:25:26,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3112153.3333333335, ans=0.1 2023-11-25 23:25:43,943 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:25:45,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-25 23:25:45,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3112286.6666666665, ans=0.125 2023-11-25 23:25:49,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3112286.6666666665, ans=0.125 2023-11-25 23:25:51,040 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9950, loss[loss=0.07164, simple_loss=0.09222, pruned_loss=0.01443, audio_tagging_loss=0.0111, over 15186.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.0914, pruned_loss=0.0129, audio_tagging_loss=0.00883, over 3047271.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:25:56,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-25 23:26:06,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3112420.0, ans=0.125 2023-11-25 23:26:09,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3112420.0, ans=0.125 2023-11-25 23:26:10,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3112420.0, ans=0.0 2023-11-25 23:26:31,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3112553.3333333335, ans=0.125 2023-11-25 23:26:36,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-25 23:26:37,244 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.531e+01 9.197e+01 9.885e+01 1.494e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-25 23:26:40,483 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-25 23:26:45,710 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10000, loss[loss=0.06822, simple_loss=0.0955, pruned_loss=0.01269, audio_tagging_loss=0.00778, over 14294.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.0912, pruned_loss=0.01281, audio_tagging_loss=0.008822, over 3036393.65 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:26:46,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3112686.6666666665, ans=0.125 2023-11-25 23:26:51,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3112686.6666666665, ans=0.125 2023-11-25 23:26:59,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3112753.3333333335, ans=0.0 2023-11-25 23:27:06,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2023-11-25 23:27:34,933 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-25 23:27:41,127 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10050, loss[loss=0.06549, simple_loss=0.08812, pruned_loss=0.01431, audio_tagging_loss=0.00712, over 15023.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09034, pruned_loss=0.01264, audio_tagging_loss=0.008905, over 3038680.87 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:27:51,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-25 23:27:54,892 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:28:22,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3113220.0, ans=0.0 2023-11-25 23:28:28,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.554e+01 9.112e+01 9.756e+01 1.275e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 23:28:30,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-25 23:28:33,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=22.5 2023-11-25 23:28:34,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2023-11-25 23:28:36,536 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10100, loss[loss=0.07238, simple_loss=0.09143, pruned_loss=0.01695, audio_tagging_loss=0.009713, over 15218.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09083, pruned_loss=0.01255, audio_tagging_loss=0.00893, over 3047489.63 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:28:42,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3113353.3333333335, ans=0.125 2023-11-25 23:28:52,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3113420.0, ans=0.07 2023-11-25 23:28:58,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3113486.6666666665, ans=0.125 2023-11-25 23:29:22,923 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:29:25,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3113620.0, ans=0.125 2023-11-25 23:29:26,097 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-25 23:29:26,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3113620.0, ans=0.125 2023-11-25 23:29:31,201 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10150, loss[loss=0.07249, simple_loss=0.1019, pruned_loss=0.01454, audio_tagging_loss=0.007018, over 15508.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09088, pruned_loss=0.01258, audio_tagging_loss=0.009045, over 3048601.39 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:29:38,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-25 23:29:49,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3113753.3333333335, ans=0.125 2023-11-25 23:29:58,933 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:30:16,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3113953.3333333335, ans=0.0 2023-11-25 23:30:18,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.705e+01 9.387e+01 9.994e+01 1.374e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-25 23:30:20,427 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-25 23:30:26,714 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10200, loss[loss=0.09488, simple_loss=0.1309, pruned_loss=0.02447, audio_tagging_loss=0.004944, over 15451.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08995, pruned_loss=0.01248, audio_tagging_loss=0.009135, over 3043022.75 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:30:28,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3114020.0, ans=0.2 2023-11-25 23:30:31,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3114020.0, ans=0.1 2023-11-25 23:30:31,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3114020.0, ans=0.0 2023-11-25 23:30:31,216 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:30:33,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3114020.0, ans=0.125 2023-11-25 23:30:42,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3114086.6666666665, ans=0.0 2023-11-25 23:30:45,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3114086.6666666665, ans=0.0 2023-11-25 23:30:49,587 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:30:49,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2023-11-25 23:30:57,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3114153.3333333335, ans=0.125 2023-11-25 23:31:16,501 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-25 23:31:21,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2023-11-25 23:31:21,709 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10250, loss[loss=0.07507, simple_loss=0.1048, pruned_loss=0.01437, audio_tagging_loss=0.008322, over 16054.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09071, pruned_loss=0.01263, audio_tagging_loss=0.009134, over 3050233.60 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:31:22,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2023-11-25 23:31:38,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3114420.0, ans=0.09899494936611666 2023-11-25 23:31:48,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3114486.6666666665, ans=0.0 2023-11-25 23:31:56,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3114553.3333333335, ans=10.0 2023-11-25 23:31:59,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3114553.3333333335, ans=0.025 2023-11-25 23:32:08,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-25 23:32:08,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.876e+01 9.394e+01 1.009e+02 1.335e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:32:11,012 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-25 23:32:17,000 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10300, loss[loss=0.07045, simple_loss=0.09087, pruned_loss=0.01154, audio_tagging_loss=0.01348, over 14385.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09096, pruned_loss=0.01265, audio_tagging_loss=0.009266, over 3048744.60 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:32:30,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3114753.3333333335, ans=0.0 2023-11-25 23:32:35,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3114753.3333333335, ans=0.125 2023-11-25 23:32:38,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-25 23:32:48,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3114886.6666666665, ans=0.0 2023-11-25 23:32:50,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2023-11-25 23:32:59,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3114886.6666666665, ans=0.2 2023-11-25 23:33:06,202 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-25 23:33:10,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3115020.0, ans=0.125 2023-11-25 23:33:10,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3115020.0, ans=0.0 2023-11-25 23:33:11,800 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10350, loss[loss=0.06499, simple_loss=0.08282, pruned_loss=0.01118, audio_tagging_loss=0.0124, over 16684.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09129, pruned_loss=0.01257, audio_tagging_loss=0.009268, over 3057763.52 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:33:14,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3115020.0, ans=0.0 2023-11-25 23:33:59,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.695e+01 9.211e+01 9.915e+01 1.210e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-25 23:34:01,269 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-25 23:34:02,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3115286.6666666665, ans=0.0 2023-11-25 23:34:04,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-25 23:34:06,991 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10400, loss[loss=0.06703, simple_loss=0.08943, pruned_loss=0.01167, audio_tagging_loss=0.01065, over 15057.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09092, pruned_loss=0.01257, audio_tagging_loss=0.009422, over 3059497.58 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:34:20,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3115420.0, ans=0.1 2023-11-25 23:34:22,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3115420.0, ans=0.0 2023-11-25 23:34:29,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3115486.6666666665, ans=0.125 2023-11-25 23:34:43,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2023-11-25 23:34:50,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3115620.0, ans=0.125 2023-11-25 23:34:55,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3115620.0, ans=0.5 2023-11-25 23:34:56,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-25 23:35:01,764 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10450, loss[loss=0.07675, simple_loss=0.1131, pruned_loss=0.0145, audio_tagging_loss=0.00571, over 16372.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09109, pruned_loss=0.01254, audio_tagging_loss=0.009297, over 3058932.36 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:35:06,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-11-25 23:35:09,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3115686.6666666665, ans=0.0 2023-11-25 23:35:22,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3115753.3333333335, ans=0.125 2023-11-25 23:35:47,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3115953.3333333335, ans=0.0 2023-11-25 23:35:49,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.667e+01 9.396e+01 1.018e+02 1.785e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:35:51,413 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-25 23:35:55,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=12.0 2023-11-25 23:35:56,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-25 23:35:56,814 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10500, loss[loss=0.0707, simple_loss=0.08874, pruned_loss=0.01636, audio_tagging_loss=0.009979, over 14900.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09066, pruned_loss=0.01261, audio_tagging_loss=0.009265, over 3052515.55 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:36:04,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2023-11-25 23:36:18,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-25 23:36:19,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3116153.3333333335, ans=0.125 2023-11-25 23:36:19,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3116153.3333333335, ans=0.125 2023-11-25 23:36:22,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3116153.3333333335, ans=0.0 2023-11-25 23:36:42,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3116286.6666666665, ans=0.2 2023-11-25 23:36:46,895 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-25 23:36:47,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.64 vs. limit=15.0 2023-11-25 23:36:52,574 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10550, loss[loss=0.0544, simple_loss=0.06609, pruned_loss=0.01194, audio_tagging_loss=0.009414, over 14104.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09075, pruned_loss=0.01265, audio_tagging_loss=0.00917, over 3045620.71 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:37:01,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2023-11-25 23:37:02,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3116420.0, ans=0.2 2023-11-25 23:37:04,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3116420.0, ans=0.2 2023-11-25 23:37:07,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3116420.0, ans=0.125 2023-11-25 23:37:11,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3116420.0, ans=0.0 2023-11-25 23:37:13,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116486.6666666665, ans=0.1 2023-11-25 23:37:40,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.690e+01 9.247e+01 9.972e+01 1.800e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:37:41,720 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-25 23:37:46,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-25 23:37:46,812 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10600, loss[loss=0.07538, simple_loss=0.0988, pruned_loss=0.01714, audio_tagging_loss=0.008842, over 15089.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09025, pruned_loss=0.01274, audio_tagging_loss=0.009075, over 3043418.27 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:37:48,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3116686.6666666665, ans=10.0 2023-11-25 23:37:50,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3116686.6666666665, ans=0.2 2023-11-25 23:38:06,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2023-11-25 23:38:30,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2023-11-25 23:38:31,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3116953.3333333335, ans=0.125 2023-11-25 23:38:35,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-25 23:38:35,968 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-25 23:38:41,683 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10650, loss[loss=0.05667, simple_loss=0.07495, pruned_loss=0.009639, audio_tagging_loss=0.009557, over 14905.00 frames. ], tot_loss[loss=0.067, simple_loss=0.0904, pruned_loss=0.01278, audio_tagging_loss=0.009014, over 3037962.79 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:38:54,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3117086.6666666665, ans=0.125 2023-11-25 23:38:57,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3117086.6666666665, ans=0.1 2023-11-25 23:39:11,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-11-25 23:39:17,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3117220.0, ans=0.0 2023-11-25 23:39:30,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.807e+01 9.255e+01 1.012e+02 1.355e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 23:39:31,436 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-25 23:39:36,809 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10700, loss[loss=0.05231, simple_loss=0.07244, pruned_loss=0.007699, audio_tagging_loss=0.008391, over 15437.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09063, pruned_loss=0.01267, audio_tagging_loss=0.008957, over 3044433.13 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:39:39,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3117353.3333333335, ans=0.0 2023-11-25 23:39:52,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3117420.0, ans=0.125 2023-11-25 23:39:53,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3117420.0, ans=0.125 2023-11-25 23:39:55,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3117420.0, ans=0.0 2023-11-25 23:40:26,119 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-25 23:40:26,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-25 23:40:28,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3117620.0, ans=0.125 2023-11-25 23:40:31,271 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10750, loss[loss=0.06064, simple_loss=0.08346, pruned_loss=0.009482, audio_tagging_loss=0.00943, over 14714.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09072, pruned_loss=0.01276, audio_tagging_loss=0.008919, over 3043643.02 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:40:32,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3117686.6666666665, ans=0.125 2023-11-25 23:40:35,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2023-11-25 23:41:07,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3117886.6666666665, ans=0.0 2023-11-25 23:41:12,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3117886.6666666665, ans=0.125 2023-11-25 23:41:19,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.994e+01 8.803e+01 9.280e+01 9.939e+01 1.365e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:41:20,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-25 23:41:25,499 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10800, loss[loss=0.06013, simple_loss=0.07548, pruned_loss=0.009565, audio_tagging_loss=0.01282, over 15054.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09061, pruned_loss=0.01276, audio_tagging_loss=0.008825, over 3045916.46 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:41:34,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-25 23:41:58,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3118220.0, ans=0.0 2023-11-25 23:42:15,803 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-25 23:42:16,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3118286.6666666665, ans=0.1 2023-11-25 23:42:18,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118286.6666666665, ans=0.1 2023-11-25 23:42:20,986 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10850, loss[loss=0.06497, simple_loss=0.07874, pruned_loss=0.01226, audio_tagging_loss=0.01334, over 14889.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09059, pruned_loss=0.01283, audio_tagging_loss=0.008813, over 3040914.03 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:42:23,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3118353.3333333335, ans=0.125 2023-11-25 23:42:29,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3118353.3333333335, ans=0.1 2023-11-25 23:42:48,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3118486.6666666665, ans=0.1 2023-11-25 23:43:09,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.755e+01 9.381e+01 1.019e+02 1.994e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-25 23:43:09,938 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-25 23:43:14,301 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:43:15,316 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10900, loss[loss=0.06012, simple_loss=0.08529, pruned_loss=0.00999, audio_tagging_loss=0.007487, over 15360.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09106, pruned_loss=0.01279, audio_tagging_loss=0.008829, over 3049535.44 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:43:28,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3118753.3333333335, ans=0.125 2023-11-25 23:43:28,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2023-11-25 23:43:34,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3118753.3333333335, ans=0.125 2023-11-25 23:43:38,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3118820.0, ans=0.125 2023-11-25 23:43:59,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=12.0 2023-11-25 23:44:01,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3118953.3333333335, ans=0.125 2023-11-25 23:44:01,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3118953.3333333335, ans=0.1 2023-11-25 23:44:04,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-25 23:44:09,318 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10950, loss[loss=0.06809, simple_loss=0.09469, pruned_loss=0.0116, audio_tagging_loss=0.009152, over 15506.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09148, pruned_loss=0.01279, audio_tagging_loss=0.008845, over 3050499.70 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:44:16,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.90 vs. limit=10.0 2023-11-25 23:44:18,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3119020.0, ans=0.125 2023-11-25 23:44:23,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3119086.6666666665, ans=0.0 2023-11-25 23:44:39,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3119153.3333333335, ans=0.1 2023-11-25 23:44:43,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3119220.0, ans=0.125 2023-11-25 23:44:57,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3119286.6666666665, ans=0.07 2023-11-25 23:44:58,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.372e+01 9.128e+01 9.666e+01 1.249e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-25 23:44:58,298 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-25 23:45:04,511 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11000, loss[loss=0.05214, simple_loss=0.0735, pruned_loss=0.006843, audio_tagging_loss=0.008546, over 14747.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09116, pruned_loss=0.01266, audio_tagging_loss=0.00886, over 3045251.93 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:45:15,973 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:45:22,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3119420.0, ans=0.0 2023-11-25 23:45:31,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3119486.6666666665, ans=0.0 2023-11-25 23:45:31,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3119486.6666666665, ans=0.04949747468305833 2023-11-25 23:45:33,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2023-11-25 23:45:42,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3119553.3333333335, ans=0.0 2023-11-25 23:45:42,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2023-11-25 23:45:53,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3119620.0, ans=0.125 2023-11-25 23:45:54,402 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-25 23:45:56,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3119620.0, ans=0.0 2023-11-25 23:45:59,572 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11050, loss[loss=0.07803, simple_loss=0.1082, pruned_loss=0.01501, audio_tagging_loss=0.008889, over 15339.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09102, pruned_loss=0.01272, audio_tagging_loss=0.008936, over 3045321.84 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:46:02,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2023-11-25 23:46:05,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3119686.6666666665, ans=0.125 2023-11-25 23:46:09,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3119753.3333333335, ans=0.125 2023-11-25 23:46:15,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3119753.3333333335, ans=0.125 2023-11-25 23:46:25,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.84 vs. limit=10.0 2023-11-25 23:46:33,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3119886.6666666665, ans=0.1 2023-11-25 23:46:42,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-25 23:46:48,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.692e+01 9.297e+01 1.029e+02 1.368e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 23:46:48,432 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-25 23:46:55,486 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11100, loss[loss=0.0794, simple_loss=0.1026, pruned_loss=0.01802, audio_tagging_loss=0.01007, over 15719.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09115, pruned_loss=0.01273, audio_tagging_loss=0.009094, over 3046032.22 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:47:16,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3120153.3333333335, ans=0.2 2023-11-25 23:47:37,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3120220.0, ans=0.2 2023-11-25 23:47:39,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3120286.6666666665, ans=0.125 2023-11-25 23:47:44,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-25 23:47:50,028 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11150, loss[loss=0.08241, simple_loss=0.1104, pruned_loss=0.01882, audio_tagging_loss=0.008408, over 14828.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09055, pruned_loss=0.01275, audio_tagging_loss=0.009151, over 3042629.84 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:47:52,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=22.5 2023-11-25 23:48:04,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3120420.0, ans=0.1 2023-11-25 23:48:25,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=12.0 2023-11-25 23:48:36,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3120620.0, ans=0.125 2023-11-25 23:48:38,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.667e+01 9.262e+01 9.903e+01 1.395e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 23:48:38,280 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-25 23:48:43,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2023-11-25 23:48:43,928 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11200, loss[loss=0.05511, simple_loss=0.0822, pruned_loss=0.005608, audio_tagging_loss=0.008399, over 15470.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08945, pruned_loss=0.01248, audio_tagging_loss=0.00926, over 3041872.31 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:48:47,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=15.0 2023-11-25 23:48:49,362 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:48:50,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3120686.6666666665, ans=0.0 2023-11-25 23:48:52,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3120686.6666666665, ans=0.07 2023-11-25 23:49:04,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3120820.0, ans=0.125 2023-11-25 23:49:11,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3120820.0, ans=0.0 2023-11-25 23:49:31,646 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-25 23:49:36,750 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11250, loss[loss=0.07102, simple_loss=0.09247, pruned_loss=0.0141, audio_tagging_loss=0.01069, over 15292.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08849, pruned_loss=0.01233, audio_tagging_loss=0.009205, over 3052706.93 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:49:38,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.74 vs. limit=10.0 2023-11-25 23:49:41,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3121020.0, ans=0.0 2023-11-25 23:49:52,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3121086.6666666665, ans=0.09899494936611666 2023-11-25 23:50:08,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-25 23:50:08,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2023-11-25 23:50:11,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2023-11-25 23:50:17,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3121220.0, ans=0.2 2023-11-25 23:50:25,548 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-25 23:50:26,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.668e+01 9.346e+01 1.011e+02 2.547e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-25 23:50:31,483 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11300, loss[loss=0.06718, simple_loss=0.0842, pruned_loss=0.01614, audio_tagging_loss=0.008934, over 14365.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08962, pruned_loss=0.01266, audio_tagging_loss=0.008997, over 3061430.85 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:50:35,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3121353.3333333335, ans=0.0 2023-11-25 23:50:39,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3121353.3333333335, ans=10.0 2023-11-25 23:50:48,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-11-25 23:50:55,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2023-11-25 23:50:55,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3121486.6666666665, ans=0.04949747468305833 2023-11-25 23:51:20,343 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-25 23:51:25,991 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11350, loss[loss=0.05915, simple_loss=0.07924, pruned_loss=0.01187, audio_tagging_loss=0.007661, over 15051.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09037, pruned_loss=0.01277, audio_tagging_loss=0.008851, over 3057038.93 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:51:33,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3121686.6666666665, ans=0.0 2023-11-25 23:52:11,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2023-11-25 23:52:15,194 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-25 23:52:16,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.715e+01 9.313e+01 1.012e+02 1.423e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-25 23:52:20,315 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11400, loss[loss=0.07495, simple_loss=0.09897, pruned_loss=0.01677, audio_tagging_loss=0.008695, over 15022.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09, pruned_loss=0.01259, audio_tagging_loss=0.008793, over 3051535.96 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:52:26,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3122020.0, ans=0.0 2023-11-25 23:52:38,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3122086.6666666665, ans=0.05 2023-11-25 23:52:43,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3122153.3333333335, ans=0.1 2023-11-25 23:52:53,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=22.5 2023-11-25 23:52:55,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3122220.0, ans=0.1 2023-11-25 23:52:58,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3122220.0, ans=0.2 2023-11-25 23:53:08,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3122286.6666666665, ans=0.0 2023-11-25 23:53:09,034 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-25 23:53:09,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3122286.6666666665, ans=0.125 2023-11-25 23:53:11,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3122286.6666666665, ans=0.125 2023-11-25 23:53:14,127 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11450, loss[loss=0.06996, simple_loss=0.1045, pruned_loss=0.01049, audio_tagging_loss=0.007205, over 15198.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09077, pruned_loss=0.01273, audio_tagging_loss=0.008848, over 3046625.82 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:53:35,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3122486.6666666665, ans=0.0 2023-11-25 23:54:03,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-25 23:54:04,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.283e+01 9.283e+01 1.005e+02 1.593e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:54:09,206 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11500, loss[loss=0.06826, simple_loss=0.08282, pruned_loss=0.0161, audio_tagging_loss=0.01075, over 15117.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.0895, pruned_loss=0.0126, audio_tagging_loss=0.008925, over 3042881.38 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:54:19,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3122753.3333333335, ans=0.0 2023-11-25 23:54:20,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-25 23:54:26,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-25 23:54:27,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3122753.3333333335, ans=0.125 2023-11-25 23:54:27,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3122753.3333333335, ans=0.1 2023-11-25 23:54:33,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-25 23:54:57,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-25 23:55:00,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3122953.3333333335, ans=0.125 2023-11-25 23:55:03,507 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11550, loss[loss=0.07417, simple_loss=0.1024, pruned_loss=0.01601, audio_tagging_loss=0.006953, over 15578.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09054, pruned_loss=0.01288, audio_tagging_loss=0.008895, over 3049994.39 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:55:12,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3123086.6666666665, ans=0.125 2023-11-25 23:55:19,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3123086.6666666665, ans=0.07 2023-11-25 23:55:24,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3123153.3333333335, ans=10.0 2023-11-25 23:55:30,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3123153.3333333335, ans=0.2 2023-11-25 23:55:38,139 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:55:47,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3123286.6666666665, ans=0.0 2023-11-25 23:55:47,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-25 23:55:52,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-25 23:55:53,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.807e+01 9.353e+01 9.870e+01 1.294e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:55:57,358 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11600, loss[loss=0.0749, simple_loss=0.09283, pruned_loss=0.01956, audio_tagging_loss=0.008926, over 15200.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09137, pruned_loss=0.01306, audio_tagging_loss=0.008856, over 3053834.25 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:55:59,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3123353.3333333335, ans=0.125 2023-11-25 23:56:07,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3123420.0, ans=0.1 2023-11-25 23:56:09,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3123420.0, ans=0.0 2023-11-25 23:56:18,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3123486.6666666665, ans=0.125 2023-11-25 23:56:36,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3123553.3333333335, ans=0.125 2023-11-25 23:56:41,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-25 23:56:42,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3123620.0, ans=0.1 2023-11-25 23:56:47,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-25 23:56:52,390 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11650, loss[loss=0.05691, simple_loss=0.07637, pruned_loss=0.009089, audio_tagging_loss=0.009641, over 14107.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.0914, pruned_loss=0.01314, audio_tagging_loss=0.008871, over 3051921.12 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:56:58,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3123686.6666666665, ans=0.0 2023-11-25 23:57:01,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3123686.6666666665, ans=0.2 2023-11-25 23:57:02,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.52 vs. limit=10.0 2023-11-25 23:57:02,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3123753.3333333335, ans=0.125 2023-11-25 23:57:26,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3123886.6666666665, ans=0.0 2023-11-25 23:57:28,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3123886.6666666665, ans=0.2 2023-11-25 23:57:32,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3123886.6666666665, ans=0.2 2023-11-25 23:57:41,971 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-25 23:57:44,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.612e+01 9.119e+01 9.760e+01 1.208e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 23:57:47,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=15.0 2023-11-25 23:57:47,434 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11700, loss[loss=0.0832, simple_loss=0.1145, pruned_loss=0.0191, audio_tagging_loss=0.006865, over 15164.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09146, pruned_loss=0.01321, audio_tagging_loss=0.00887, over 3054818.32 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:57:51,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3124020.0, ans=0.0 2023-11-25 23:57:55,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3124020.0, ans=0.2 2023-11-25 23:58:05,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2023-11-25 23:58:32,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3124286.6666666665, ans=0.125 2023-11-25 23:58:34,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3124286.6666666665, ans=0.04949747468305833 2023-11-25 23:58:36,897 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-25 23:58:42,049 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11750, loss[loss=0.07338, simple_loss=0.09913, pruned_loss=0.01463, audio_tagging_loss=0.009188, over 15189.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09171, pruned_loss=0.01319, audio_tagging_loss=0.008873, over 3056152.97 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:59:03,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-25 23:59:05,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3124486.6666666665, ans=0.0 2023-11-25 23:59:10,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3124486.6666666665, ans=0.0 2023-11-25 23:59:14,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3124553.3333333335, ans=0.2 2023-11-25 23:59:32,135 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-25 23:59:34,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.692e+01 9.354e+01 9.925e+01 1.548e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:59:37,270 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11800, loss[loss=0.07022, simple_loss=0.09632, pruned_loss=0.01417, audio_tagging_loss=0.00789, over 14830.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09086, pruned_loss=0.01292, audio_tagging_loss=0.008909, over 3052676.09 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:59:57,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3124820.0, ans=0.0 2023-11-25 23:59:57,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3124820.0, ans=0.2 2023-11-26 00:00:09,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-26 00:00:20,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3124953.3333333335, ans=0.0 2023-11-26 00:00:25,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3124953.3333333335, ans=0.125 2023-11-26 00:00:26,140 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-26 00:00:31,223 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11850, loss[loss=0.06179, simple_loss=0.08476, pruned_loss=0.009632, audio_tagging_loss=0.00978, over 13558.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09102, pruned_loss=0.01292, audio_tagging_loss=0.008968, over 3046645.79 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:00:47,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-26 00:01:08,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3125220.0, ans=0.1 2023-11-26 00:01:20,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-26 00:01:22,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.741e+01 9.224e+01 1.012e+02 1.182e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 00:01:25,624 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11900, loss[loss=0.0541, simple_loss=0.07209, pruned_loss=0.00989, audio_tagging_loss=0.008168, over 15432.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09101, pruned_loss=0.01291, audio_tagging_loss=0.009096, over 3048107.80 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:01:33,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3125353.3333333335, ans=0.0 2023-11-26 00:01:35,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-26 00:02:15,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-26 00:02:20,696 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11950, loss[loss=0.06587, simple_loss=0.08944, pruned_loss=0.01178, audio_tagging_loss=0.009368, over 14879.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09134, pruned_loss=0.01283, audio_tagging_loss=0.009099, over 3054921.94 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:02:20,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3125686.6666666665, ans=0.125 2023-11-26 00:02:23,088 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:02:27,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3125686.6666666665, ans=0.125 2023-11-26 00:02:28,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3125686.6666666665, ans=0.0 2023-11-26 00:02:35,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3125753.3333333335, ans=0.125 2023-11-26 00:02:36,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3125753.3333333335, ans=0.125 2023-11-26 00:02:39,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3125753.3333333335, ans=0.125 2023-11-26 00:02:43,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-26 00:03:06,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3125953.3333333335, ans=0.125 2023-11-26 00:03:09,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-26 00:03:11,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.661e+01 9.250e+01 9.933e+01 1.391e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:03:14,636 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 12000, loss[loss=0.07294, simple_loss=0.1073, pruned_loss=0.01195, audio_tagging_loss=0.00736, over 15546.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.0912, pruned_loss=0.01275, audio_tagging_loss=0.009189, over 3052929.49 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2023-11-26 00:03:14,637 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 00:03:47,120 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05809, simple_loss=0.05065, pruned_loss=0.005132, audio_tagging_loss=0.02764, over 4681554.00 frames. 2023-11-26 00:03:47,121 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 00:03:53,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3126020.0, ans=0.125 2023-11-26 00:03:53,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2023-11-26 00:03:54,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3126020.0, ans=0.125 2023-11-26 00:03:59,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3126086.6666666665, ans=0.125 2023-11-26 00:04:02,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3126086.6666666665, ans=0.0 2023-11-26 00:04:04,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-26 00:04:40,565 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 0, loss[loss=0.07285, simple_loss=0.09006, pruned_loss=0.006279, audio_tagging_loss=0.02154, over 14604.00 frames. ], tot_loss[loss=0.07285, simple_loss=0.09006, pruned_loss=0.006279, audio_tagging_loss=0.02154, over 14604.00 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:04:40,566 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 00:05:12,142 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005121, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 00:05:12,143 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 00:05:15,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2023-11-26 00:05:17,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2023-11-26 00:05:22,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3126253.3333333335, ans=0.125 2023-11-26 00:05:27,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=22.5 2023-11-26 00:05:34,104 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-26 00:05:40,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3126320.0, ans=0.2 2023-11-26 00:06:07,104 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 50, loss[loss=0.07837, simple_loss=0.09666, pruned_loss=0.01498, audio_tagging_loss=0.01506, over 15463.00 frames. ], tot_loss[loss=0.07753, simple_loss=0.09418, pruned_loss=0.0133, audio_tagging_loss=0.01714, over 685141.79 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:06:23,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-26 00:06:23,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3126586.6666666665, ans=0.125 2023-11-26 00:06:28,960 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-26 00:06:31,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-26 00:06:32,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.207e+01 9.971e+01 1.067e+02 1.313e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-26 00:06:56,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3126786.6666666665, ans=0.05 2023-11-26 00:07:01,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3126853.3333333335, ans=0.5 2023-11-26 00:07:02,694 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 100, loss[loss=0.08364, simple_loss=0.1061, pruned_loss=0.01488, audio_tagging_loss=0.0157, over 15414.00 frames. ], tot_loss[loss=0.07431, simple_loss=0.09211, pruned_loss=0.0121, audio_tagging_loss=0.01615, over 1212810.28 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:07:03,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3126853.3333333335, ans=0.0 2023-11-26 00:07:10,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3126853.3333333335, ans=0.1 2023-11-26 00:07:14,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3126920.0, ans=0.125 2023-11-26 00:07:25,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-26 00:07:35,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-26 00:07:35,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-26 00:07:40,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-26 00:07:41,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-26 00:07:42,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3127053.3333333335, ans=0.1 2023-11-26 00:07:48,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3127120.0, ans=0.125 2023-11-26 00:07:58,577 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 150, loss[loss=0.08531, simple_loss=0.1107, pruned_loss=0.01812, audio_tagging_loss=0.01184, over 15362.00 frames. ], tot_loss[loss=0.07305, simple_loss=0.09243, pruned_loss=0.0125, audio_tagging_loss=0.01434, over 1621443.54 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:04,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=22.5 2023-11-26 00:08:15,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3127253.3333333335, ans=0.0 2023-11-26 00:08:18,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3127253.3333333335, ans=0.2 2023-11-26 00:08:21,764 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-26 00:08:24,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.020e+01 9.615e+01 1.041e+02 1.301e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 00:08:25,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3127320.0, ans=0.0 2023-11-26 00:08:37,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=12.0 2023-11-26 00:08:40,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2023-11-26 00:08:46,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3127453.3333333335, ans=0.125 2023-11-26 00:08:54,984 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 200, loss[loss=0.07419, simple_loss=0.1078, pruned_loss=0.01375, audio_tagging_loss=0.006546, over 14974.00 frames. ], tot_loss[loss=0.07041, simple_loss=0.0911, pruned_loss=0.01196, audio_tagging_loss=0.0129, over 1934836.48 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:55,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3127520.0, ans=0.1 2023-11-26 00:09:03,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3127520.0, ans=0.0 2023-11-26 00:09:10,651 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:09:16,959 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-26 00:09:17,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-26 00:09:27,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-26 00:09:49,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3127853.3333333335, ans=0.09899494936611666 2023-11-26 00:09:50,288 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 250, loss[loss=0.07261, simple_loss=0.103, pruned_loss=0.01249, audio_tagging_loss=0.008649, over 15043.00 frames. ], tot_loss[loss=0.06982, simple_loss=0.09155, pruned_loss=0.0123, audio_tagging_loss=0.01175, over 2181666.38 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:10:03,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3127920.0, ans=0.05 2023-11-26 00:10:05,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3127920.0, ans=0.0 2023-11-26 00:10:12,761 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-26 00:10:17,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.769e+01 9.325e+01 1.022e+02 1.435e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:10:26,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128053.3333333335, ans=0.1 2023-11-26 00:10:35,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3128120.0, ans=0.125 2023-11-26 00:10:36,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3128120.0, ans=0.125 2023-11-26 00:10:38,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3128120.0, ans=0.04949747468305833 2023-11-26 00:10:46,174 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 300, loss[loss=0.06952, simple_loss=0.09647, pruned_loss=0.01435, audio_tagging_loss=0.006939, over 14518.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09188, pruned_loss=0.01261, audio_tagging_loss=0.01098, over 2373349.66 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:10:51,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3128186.6666666665, ans=0.0 2023-11-26 00:10:56,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-26 00:11:09,525 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-26 00:11:10,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.74 vs. limit=15.0 2023-11-26 00:11:20,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3128386.6666666665, ans=0.0 2023-11-26 00:11:36,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3128453.3333333335, ans=0.0 2023-11-26 00:11:42,981 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 350, loss[loss=0.05007, simple_loss=0.07262, pruned_loss=0.006326, audio_tagging_loss=0.007433, over 15900.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09233, pruned_loss=0.01276, audio_tagging_loss=0.01041, over 2525690.44 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:11:54,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-26 00:11:54,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3128586.6666666665, ans=0.025 2023-11-26 00:11:55,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-26 00:12:03,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3128653.3333333335, ans=0.0 2023-11-26 00:12:04,829 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-26 00:12:05,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.22 vs. limit=15.0 2023-11-26 00:12:07,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3128653.3333333335, ans=0.125 2023-11-26 00:12:08,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.708e+01 9.325e+01 9.980e+01 1.485e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:12:30,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2023-11-26 00:12:32,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3128786.6666666665, ans=0.0 2023-11-26 00:12:38,373 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 400, loss[loss=0.06927, simple_loss=0.09694, pruned_loss=0.01304, audio_tagging_loss=0.007759, over 15481.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09288, pruned_loss=0.01287, audio_tagging_loss=0.009872, over 2639566.61 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:12:38,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3128853.3333333335, ans=0.2 2023-11-26 00:12:38,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=22.5 2023-11-26 00:12:39,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3128853.3333333335, ans=0.125 2023-11-26 00:12:41,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3128853.3333333335, ans=0.2 2023-11-26 00:12:58,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3128920.0, ans=0.0 2023-11-26 00:13:00,073 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-26 00:13:23,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2023-11-26 00:13:29,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2023-11-26 00:13:32,811 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 450, loss[loss=0.0666, simple_loss=0.08902, pruned_loss=0.01216, audio_tagging_loss=0.009931, over 14461.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09143, pruned_loss=0.01257, audio_tagging_loss=0.009637, over 2728052.53 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:13:56,314 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-26 00:14:00,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.743e+01 9.299e+01 9.864e+01 1.390e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 00:14:02,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3129320.0, ans=0.125 2023-11-26 00:14:03,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3129320.0, ans=0.2 2023-11-26 00:14:04,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3129320.0, ans=0.125 2023-11-26 00:14:11,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3129386.6666666665, ans=0.0 2023-11-26 00:14:14,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3129386.6666666665, ans=0.04949747468305833 2023-11-26 00:14:27,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3129453.3333333335, ans=0.0 2023-11-26 00:14:28,985 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 500, loss[loss=0.07197, simple_loss=0.09204, pruned_loss=0.01609, audio_tagging_loss=0.009865, over 16454.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.0927, pruned_loss=0.0129, audio_tagging_loss=0.009391, over 2803719.99 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:14:29,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-11-26 00:14:37,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3129520.0, ans=0.2 2023-11-26 00:14:43,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-26 00:14:51,405 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-26 00:15:24,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=8.0 2023-11-26 00:15:24,644 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 550, loss[loss=0.06667, simple_loss=0.08548, pruned_loss=0.01407, audio_tagging_loss=0.009856, over 13351.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09224, pruned_loss=0.01277, audio_tagging_loss=0.009395, over 2852454.44 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:15:32,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3129853.3333333335, ans=0.0 2023-11-26 00:15:38,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3129920.0, ans=0.125 2023-11-26 00:15:40,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3129920.0, ans=0.0 2023-11-26 00:15:46,708 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-26 00:15:46,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3129986.6666666665, ans=0.125 2023-11-26 00:15:49,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3129986.6666666665, ans=0.2 2023-11-26 00:15:50,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.611e+01 9.176e+01 9.917e+01 4.186e+02, threshold=1.835e+02, percent-clipped=1.0 2023-11-26 00:16:02,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3130053.3333333335, ans=0.2 2023-11-26 00:16:03,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-26 00:16:19,920 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 600, loss[loss=0.06487, simple_loss=0.0913, pruned_loss=0.01031, audio_tagging_loss=0.008907, over 16768.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09171, pruned_loss=0.01275, audio_tagging_loss=0.009352, over 2895682.42 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:16:31,947 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:16:43,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-26 00:16:45,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3130320.0, ans=0.1 2023-11-26 00:16:51,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3130320.0, ans=0.5 2023-11-26 00:17:02,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3130386.6666666665, ans=0.125 2023-11-26 00:17:15,653 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:17:16,593 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 650, loss[loss=0.06511, simple_loss=0.07966, pruned_loss=0.01336, audio_tagging_loss=0.01192, over 15575.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09174, pruned_loss=0.01287, audio_tagging_loss=0.009312, over 2928567.70 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:17:19,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3130520.0, ans=0.1 2023-11-26 00:17:23,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3130520.0, ans=0.1 2023-11-26 00:17:29,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3130586.6666666665, ans=0.125 2023-11-26 00:17:39,106 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-26 00:17:43,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.552e+01 9.119e+01 9.990e+01 1.151e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:18:12,543 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 700, loss[loss=0.07073, simple_loss=0.09743, pruned_loss=0.01522, audio_tagging_loss=0.006793, over 15096.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09054, pruned_loss=0.01271, audio_tagging_loss=0.009325, over 2950322.86 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:18:15,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2023-11-26 00:18:18,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-11-26 00:18:34,871 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-26 00:18:50,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2023-11-26 00:18:51,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3131053.3333333335, ans=0.125 2023-11-26 00:19:03,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3131120.0, ans=10.0 2023-11-26 00:19:07,766 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 750, loss[loss=0.06184, simple_loss=0.0887, pruned_loss=0.01058, audio_tagging_loss=0.006903, over 14873.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08974, pruned_loss=0.01258, audio_tagging_loss=0.009273, over 2966952.69 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:19:29,612 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-26 00:19:34,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.495e+01 9.390e+01 9.960e+01 1.200e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:20:01,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-26 00:20:03,199 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 800, loss[loss=0.07084, simple_loss=0.1033, pruned_loss=0.01184, audio_tagging_loss=0.007357, over 15896.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09099, pruned_loss=0.01279, audio_tagging_loss=0.009217, over 2981264.88 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:20:06,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3131520.0, ans=0.05 2023-11-26 00:20:23,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2023-11-26 00:20:25,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-26 00:20:28,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3131653.3333333335, ans=0.0 2023-11-26 00:20:30,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3131653.3333333335, ans=0.0 2023-11-26 00:20:35,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3131720.0, ans=0.125 2023-11-26 00:20:37,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3131720.0, ans=0.125 2023-11-26 00:20:52,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3131786.6666666665, ans=0.1 2023-11-26 00:20:59,438 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 850, loss[loss=0.06984, simple_loss=0.09756, pruned_loss=0.01203, audio_tagging_loss=0.009025, over 14303.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09136, pruned_loss=0.01303, audio_tagging_loss=0.009302, over 2995975.79 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:21:09,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3131920.0, ans=0.0 2023-11-26 00:21:21,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-26 00:21:26,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.683e+01 9.047e+01 1.001e+02 1.303e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-26 00:21:33,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-26 00:21:36,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-26 00:21:47,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3132120.0, ans=0.125 2023-11-26 00:21:49,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3132120.0, ans=0.125 2023-11-26 00:21:55,551 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 900, loss[loss=0.03242, simple_loss=0.03444, pruned_loss=0.003331, audio_tagging_loss=0.01187, over 15004.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.0911, pruned_loss=0.01289, audio_tagging_loss=0.009308, over 3008343.49 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:22:06,574 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:22:09,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3132253.3333333335, ans=0.2 2023-11-26 00:22:16,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3132253.3333333335, ans=0.125 2023-11-26 00:22:18,257 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-26 00:22:44,999 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:22:52,234 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 950, loss[loss=0.05861, simple_loss=0.08283, pruned_loss=0.01099, audio_tagging_loss=0.006211, over 15109.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09043, pruned_loss=0.01261, audio_tagging_loss=0.009245, over 3017271.25 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:23:06,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3132586.6666666665, ans=0.5 2023-11-26 00:23:14,126 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-26 00:23:19,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.462e+01 9.352e+01 1.021e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 00:23:30,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3132720.0, ans=12.0 2023-11-26 00:23:35,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-26 00:23:44,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2023-11-26 00:23:47,645 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1000, loss[loss=0.08129, simple_loss=0.1138, pruned_loss=0.01878, audio_tagging_loss=0.005628, over 15627.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09103, pruned_loss=0.01259, audio_tagging_loss=0.009109, over 3027195.37 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:24:01,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-26 00:24:10,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-26 00:24:12,218 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:24:20,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3132986.6666666665, ans=0.2 2023-11-26 00:24:31,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133120.0, ans=0.1 2023-11-26 00:24:43,927 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1050, loss[loss=0.08135, simple_loss=0.1206, pruned_loss=0.01485, audio_tagging_loss=0.006209, over 16270.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09116, pruned_loss=0.01271, audio_tagging_loss=0.008975, over 3039907.17 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:25:00,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3133253.3333333335, ans=0.125 2023-11-26 00:25:01,690 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:25:06,937 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-26 00:25:07,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-26 00:25:12,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.549e+01 9.309e+01 1.004e+02 1.287e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 00:25:12,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3133320.0, ans=0.125 2023-11-26 00:25:24,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3133386.6666666665, ans=0.125 2023-11-26 00:25:40,148 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1100, loss[loss=0.06507, simple_loss=0.09323, pruned_loss=0.01196, audio_tagging_loss=0.00649, over 15719.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09185, pruned_loss=0.01295, audio_tagging_loss=0.008846, over 3041428.82 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:25:40,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3133520.0, ans=0.125 2023-11-26 00:25:43,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-26 00:25:44,374 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:25:44,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3133520.0, ans=0.125 2023-11-26 00:25:47,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3133520.0, ans=0.1 2023-11-26 00:25:49,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3133520.0, ans=0.1 2023-11-26 00:25:52,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3133586.6666666665, ans=0.0 2023-11-26 00:26:02,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-26 00:26:07,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3133653.3333333335, ans=0.125 2023-11-26 00:26:29,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3133786.6666666665, ans=0.0 2023-11-26 00:26:36,641 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1150, loss[loss=0.05202, simple_loss=0.06846, pruned_loss=0.006188, audio_tagging_loss=0.0116, over 14762.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09124, pruned_loss=0.01283, audio_tagging_loss=0.008906, over 3042991.60 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:26:42,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2023-11-26 00:26:43,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3133853.3333333335, ans=0.0 2023-11-26 00:26:49,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133920.0, ans=0.1 2023-11-26 00:26:58,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-26 00:27:04,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.538e+01 9.163e+01 1.008e+02 1.257e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 00:27:05,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3133986.6666666665, ans=10.0 2023-11-26 00:27:08,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3133986.6666666665, ans=0.0 2023-11-26 00:27:13,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-26 00:27:14,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3134053.3333333335, ans=0.025 2023-11-26 00:27:18,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3134053.3333333335, ans=0.04949747468305833 2023-11-26 00:27:28,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3134120.0, ans=0.125 2023-11-26 00:27:32,277 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1200, loss[loss=0.0767, simple_loss=0.1125, pruned_loss=0.01281, audio_tagging_loss=0.007657, over 16272.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09085, pruned_loss=0.01279, audio_tagging_loss=0.008855, over 3043364.22 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:27:49,685 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:27:55,260 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-26 00:28:07,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-26 00:28:13,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-26 00:28:15,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-26 00:28:15,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-26 00:28:19,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3134453.3333333335, ans=0.0 2023-11-26 00:28:27,666 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1250, loss[loss=0.07569, simple_loss=0.102, pruned_loss=0.01637, audio_tagging_loss=0.008297, over 15774.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09008, pruned_loss=0.01268, audio_tagging_loss=0.008836, over 3047653.48 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:28:35,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3134520.0, ans=0.0 2023-11-26 00:28:47,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3134586.6666666665, ans=0.125 2023-11-26 00:28:49,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3134653.3333333335, ans=0.0 2023-11-26 00:28:50,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-26 00:28:50,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-26 00:28:56,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.537e+01 9.082e+01 9.508e+01 1.462e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 00:29:10,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-26 00:29:23,810 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1300, loss[loss=0.07774, simple_loss=0.1199, pruned_loss=0.0111, audio_tagging_loss=0.006692, over 14343.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09138, pruned_loss=0.0129, audio_tagging_loss=0.008796, over 3051515.51 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:29:45,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-26 00:29:52,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3134986.6666666665, ans=0.2 2023-11-26 00:29:52,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3134986.6666666665, ans=0.2 2023-11-26 00:30:08,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3135120.0, ans=0.0 2023-11-26 00:30:10,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3135120.0, ans=0.125 2023-11-26 00:30:19,470 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1350, loss[loss=0.07549, simple_loss=0.1038, pruned_loss=0.0162, audio_tagging_loss=0.007408, over 16065.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09109, pruned_loss=0.01287, audio_tagging_loss=0.008869, over 3049807.43 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:30:23,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2023-11-26 00:30:32,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-26 00:30:41,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-26 00:30:47,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.371e+01 9.120e+01 9.741e+01 1.134e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:31:00,786 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:31:08,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-26 00:31:11,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3135453.3333333335, ans=0.1 2023-11-26 00:31:14,771 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1400, loss[loss=0.07393, simple_loss=0.1048, pruned_loss=0.01322, audio_tagging_loss=0.008303, over 16023.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09073, pruned_loss=0.01283, audio_tagging_loss=0.008936, over 3054476.39 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:31:19,910 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:31:20,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135520.0, ans=0.1 2023-11-26 00:31:38,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-26 00:31:54,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3135720.0, ans=0.0 2023-11-26 00:31:55,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-11-26 00:31:56,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3135720.0, ans=0.0 2023-11-26 00:32:04,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3135786.6666666665, ans=0.125 2023-11-26 00:32:11,756 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1450, loss[loss=0.06982, simple_loss=0.08823, pruned_loss=0.01479, audio_tagging_loss=0.01092, over 15451.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09105, pruned_loss=0.01288, audio_tagging_loss=0.008921, over 3045745.38 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:32:22,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-11-26 00:32:33,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-26 00:32:40,435 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.703e+01 9.390e+01 1.022e+02 1.337e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:32:45,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3136053.3333333335, ans=0.025 2023-11-26 00:32:56,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2023-11-26 00:33:05,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3136120.0, ans=0.125 2023-11-26 00:33:06,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3136120.0, ans=0.125 2023-11-26 00:33:07,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3136186.6666666665, ans=0.125 2023-11-26 00:33:08,193 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1500, loss[loss=0.06347, simple_loss=0.08222, pruned_loss=0.01165, audio_tagging_loss=0.0107, over 14949.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09035, pruned_loss=0.01281, audio_tagging_loss=0.009094, over 3048419.15 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:33:08,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3136186.6666666665, ans=0.0 2023-11-26 00:33:12,678 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:33:17,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3136253.3333333335, ans=0.125 2023-11-26 00:33:25,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3136253.3333333335, ans=0.0 2023-11-26 00:33:30,721 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-26 00:33:43,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-26 00:33:52,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-26 00:33:58,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-26 00:34:00,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-26 00:34:03,650 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1550, loss[loss=0.05518, simple_loss=0.07062, pruned_loss=0.009173, audio_tagging_loss=0.01069, over 15856.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09058, pruned_loss=0.01278, audio_tagging_loss=0.009139, over 3043599.45 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:34:13,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3136520.0, ans=0.125 2023-11-26 00:34:14,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136586.6666666665, ans=0.1 2023-11-26 00:34:26,669 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-26 00:34:33,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.668e+01 9.304e+01 9.957e+01 1.824e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 00:34:41,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3136720.0, ans=0.125 2023-11-26 00:34:43,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3136720.0, ans=0.125 2023-11-26 00:34:45,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3136720.0, ans=0.0 2023-11-26 00:34:48,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-26 00:34:59,526 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1600, loss[loss=0.07383, simple_loss=0.104, pruned_loss=0.0136, audio_tagging_loss=0.008243, over 15104.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09088, pruned_loss=0.01282, audio_tagging_loss=0.009157, over 3045720.94 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:35:02,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3136853.3333333335, ans=15.0 2023-11-26 00:35:22,166 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-26 00:35:23,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3136986.6666666665, ans=0.125 2023-11-26 00:35:27,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3136986.6666666665, ans=0.1 2023-11-26 00:35:51,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3137120.0, ans=0.125 2023-11-26 00:35:55,998 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1650, loss[loss=0.08006, simple_loss=0.09389, pruned_loss=0.02246, audio_tagging_loss=0.01066, over 14094.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09051, pruned_loss=0.01278, audio_tagging_loss=0.009135, over 3047559.78 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:35:56,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2023-11-26 00:35:57,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3137186.6666666665, ans=0.07 2023-11-26 00:36:03,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3137186.6666666665, ans=0.0 2023-11-26 00:36:06,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3137253.3333333335, ans=0.125 2023-11-26 00:36:07,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3137253.3333333335, ans=0.125 2023-11-26 00:36:17,253 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-26 00:36:24,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.506e+01 9.125e+01 1.020e+02 1.203e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 00:36:26,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3137320.0, ans=0.0 2023-11-26 00:36:37,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3137386.6666666665, ans=0.125 2023-11-26 00:36:39,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3137453.3333333335, ans=0.2 2023-11-26 00:36:40,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3137453.3333333335, ans=0.0 2023-11-26 00:36:43,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-26 00:36:51,038 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1700, loss[loss=0.08063, simple_loss=0.105, pruned_loss=0.01861, audio_tagging_loss=0.009548, over 14768.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09123, pruned_loss=0.01282, audio_tagging_loss=0.009206, over 3047256.02 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:36:55,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3137520.0, ans=0.125 2023-11-26 00:36:55,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2023-11-26 00:37:00,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3137520.0, ans=0.2 2023-11-26 00:37:08,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3137586.6666666665, ans=0.2 2023-11-26 00:37:09,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3137586.6666666665, ans=0.07 2023-11-26 00:37:12,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-26 00:37:13,046 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-26 00:37:17,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3137653.3333333335, ans=0.09899494936611666 2023-11-26 00:37:20,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-11-26 00:37:21,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3137653.3333333335, ans=10.0 2023-11-26 00:37:21,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-26 00:37:24,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3137720.0, ans=0.0 2023-11-26 00:37:26,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3137720.0, ans=0.125 2023-11-26 00:37:46,335 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1750, loss[loss=0.07218, simple_loss=0.09451, pruned_loss=0.01738, audio_tagging_loss=0.007546, over 15410.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09043, pruned_loss=0.01269, audio_tagging_loss=0.009187, over 3039336.61 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:38:05,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3137920.0, ans=0.125 2023-11-26 00:38:06,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3137920.0, ans=0.125 2023-11-26 00:38:07,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3137986.6666666665, ans=0.2 2023-11-26 00:38:08,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-26 00:38:16,171 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.541e+01 8.977e+01 9.696e+01 1.531e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-26 00:38:42,298 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1800, loss[loss=0.09148, simple_loss=0.1219, pruned_loss=0.02094, audio_tagging_loss=0.009577, over 14652.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09058, pruned_loss=0.0126, audio_tagging_loss=0.009013, over 3028626.26 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:38:44,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138186.6666666665, ans=0.1 2023-11-26 00:39:03,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-26 00:39:14,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3138386.6666666665, ans=0.125 2023-11-26 00:39:24,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3138386.6666666665, ans=0.1 2023-11-26 00:39:32,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-26 00:39:33,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3138453.3333333335, ans=0.125 2023-11-26 00:39:37,363 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1850, loss[loss=0.06301, simple_loss=0.07752, pruned_loss=0.01436, audio_tagging_loss=0.009889, over 15006.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09052, pruned_loss=0.01267, audio_tagging_loss=0.008953, over 3032721.74 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:39:40,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3138520.0, ans=0.07 2023-11-26 00:39:51,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3138586.6666666665, ans=0.0 2023-11-26 00:39:59,048 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-26 00:40:02,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3138653.3333333335, ans=0.125 2023-11-26 00:40:03,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3138653.3333333335, ans=0.0 2023-11-26 00:40:07,148 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.736e+01 9.136e+01 9.723e+01 1.171e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 00:40:11,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3138720.0, ans=0.2 2023-11-26 00:40:32,780 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1900, loss[loss=0.05425, simple_loss=0.07355, pruned_loss=0.008694, audio_tagging_loss=0.008786, over 14547.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09022, pruned_loss=0.01249, audio_tagging_loss=0.008917, over 3031501.97 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:40:40,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138853.3333333335, ans=0.1 2023-11-26 00:40:55,378 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-26 00:41:02,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2023-11-26 00:41:11,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:41:15,490 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:41:28,555 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1950, loss[loss=0.05459, simple_loss=0.07632, pruned_loss=0.008157, audio_tagging_loss=0.008277, over 15107.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09009, pruned_loss=0.01249, audio_tagging_loss=0.008943, over 3034479.11 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:41:48,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3139253.3333333335, ans=0.125 2023-11-26 00:41:50,560 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-26 00:41:59,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.427e+01 9.159e+01 1.002e+02 1.233e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 00:42:07,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:42:24,492 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2000, loss[loss=0.05849, simple_loss=0.079, pruned_loss=0.01149, audio_tagging_loss=0.007502, over 14646.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08932, pruned_loss=0.01243, audio_tagging_loss=0.008971, over 3025663.29 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:42:31,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3139520.0, ans=0.0 2023-11-26 00:42:40,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3139586.6666666665, ans=0.0 2023-11-26 00:42:46,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-26 00:43:01,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3139720.0, ans=0.125 2023-11-26 00:43:19,542 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2050, loss[loss=0.0644, simple_loss=0.08848, pruned_loss=0.01245, audio_tagging_loss=0.007705, over 15274.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08966, pruned_loss=0.01247, audio_tagging_loss=0.008959, over 3026448.75 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:43:22,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-26 00:43:25,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3139853.3333333335, ans=0.125 2023-11-26 00:43:28,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3139853.3333333335, ans=0.0 2023-11-26 00:43:41,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-26 00:43:50,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.583e+01 9.206e+01 9.963e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 00:43:54,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-26 00:43:58,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3140053.3333333335, ans=0.2 2023-11-26 00:44:12,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3140120.0, ans=0.04949747468305833 2023-11-26 00:44:15,654 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2100, loss[loss=0.08203, simple_loss=0.1142, pruned_loss=0.0167, audio_tagging_loss=0.008248, over 14055.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09, pruned_loss=0.01268, audio_tagging_loss=0.008941, over 3024414.33 frames. ], batch size: 52, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:44:25,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-26 00:44:38,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-26 00:44:38,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-11-26 00:45:01,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3140453.3333333335, ans=0.1 2023-11-26 00:45:04,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3140453.3333333335, ans=0.1 2023-11-26 00:45:11,031 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2150, loss[loss=0.07143, simple_loss=0.09438, pruned_loss=0.01539, audio_tagging_loss=0.008843, over 15578.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08956, pruned_loss=0.01266, audio_tagging_loss=0.00891, over 3020585.63 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:45:33,541 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-26 00:45:41,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.773e+01 9.255e+01 9.995e+01 1.124e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:45:45,891 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:45:49,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3140720.0, ans=0.09899494936611666 2023-11-26 00:46:06,711 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2200, loss[loss=0.07149, simple_loss=0.08437, pruned_loss=0.0182, audio_tagging_loss=0.01111, over 15267.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09048, pruned_loss=0.01274, audio_tagging_loss=0.00886, over 3032422.35 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:46:08,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3140853.3333333335, ans=0.95 2023-11-26 00:46:09,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-11-26 00:46:29,025 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-26 00:46:39,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3141053.3333333335, ans=0.0 2023-11-26 00:46:48,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3141053.3333333335, ans=0.1 2023-11-26 00:46:53,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-26 00:46:56,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3141120.0, ans=0.0 2023-11-26 00:47:01,676 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2250, loss[loss=0.08427, simple_loss=0.1216, pruned_loss=0.01706, audio_tagging_loss=0.00642, over 14650.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09119, pruned_loss=0.01278, audio_tagging_loss=0.008849, over 3036488.16 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:47:09,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-26 00:47:23,525 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-26 00:47:32,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.619e+01 9.398e+01 1.009e+02 1.153e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 00:47:35,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3141386.6666666665, ans=0.1 2023-11-26 00:47:52,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3141453.3333333335, ans=0.1 2023-11-26 00:47:57,262 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2300, loss[loss=0.08558, simple_loss=0.1279, pruned_loss=0.01385, audio_tagging_loss=0.007797, over 15999.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09166, pruned_loss=0.0127, audio_tagging_loss=0.008977, over 3036227.09 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:48:07,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3141586.6666666665, ans=0.2 2023-11-26 00:48:19,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-26 00:48:41,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:42,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3141786.6666666665, ans=0.2 2023-11-26 00:48:43,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3141786.6666666665, ans=0.0 2023-11-26 00:48:46,549 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:48:47,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:52,334 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2350, loss[loss=0.0577, simple_loss=0.07733, pruned_loss=0.0108, audio_tagging_loss=0.008234, over 14768.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09153, pruned_loss=0.01257, audio_tagging_loss=0.008989, over 3040766.13 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:48:53,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141853.3333333335, ans=0.1 2023-11-26 00:48:54,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3141853.3333333335, ans=0.0 2023-11-26 00:49:11,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3141920.0, ans=0.0 2023-11-26 00:49:14,615 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-26 00:49:23,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.557e+01 9.249e+01 9.915e+01 1.418e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:49:27,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-26 00:49:30,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-26 00:49:32,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3142053.3333333335, ans=0.05 2023-11-26 00:49:38,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3142120.0, ans=0.125 2023-11-26 00:49:44,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3142120.0, ans=0.0 2023-11-26 00:49:48,003 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2400, loss[loss=0.08181, simple_loss=0.1159, pruned_loss=0.01494, audio_tagging_loss=0.008926, over 15769.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09038, pruned_loss=0.01247, audio_tagging_loss=0.009108, over 3040496.66 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:49:59,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3142253.3333333335, ans=0.125 2023-11-26 00:50:01,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3142253.3333333335, ans=0.0 2023-11-26 00:50:02,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-26 00:50:05,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3142253.3333333335, ans=0.0 2023-11-26 00:50:09,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-26 00:50:19,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3142386.6666666665, ans=0.0 2023-11-26 00:50:30,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-26 00:50:43,039 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2450, loss[loss=0.07304, simple_loss=0.09866, pruned_loss=0.0127, audio_tagging_loss=0.01101, over 15378.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09006, pruned_loss=0.01236, audio_tagging_loss=0.009223, over 3041428.38 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:50:46,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3142520.0, ans=0.125 2023-11-26 00:50:48,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-26 00:50:58,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-26 00:51:05,270 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-26 00:51:07,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142653.3333333335, ans=0.1 2023-11-26 00:51:13,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.694e+01 9.441e+01 1.025e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 00:51:20,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3142720.0, ans=0.0 2023-11-26 00:51:24,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3142720.0, ans=0.2 2023-11-26 00:51:36,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3142853.3333333335, ans=0.125 2023-11-26 00:51:37,674 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2500, loss[loss=0.07819, simple_loss=0.1136, pruned_loss=0.01425, audio_tagging_loss=0.007153, over 15114.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09028, pruned_loss=0.01241, audio_tagging_loss=0.009253, over 3044416.28 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:51:50,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2023-11-26 00:51:58,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3142920.0, ans=0.0 2023-11-26 00:52:00,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-26 00:52:14,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3143053.3333333335, ans=0.125 2023-11-26 00:52:16,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3143053.3333333335, ans=0.2 2023-11-26 00:52:23,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3143120.0, ans=0.1 2023-11-26 00:52:30,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=22.5 2023-11-26 00:52:33,327 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2550, loss[loss=0.06987, simple_loss=0.09813, pruned_loss=0.01334, audio_tagging_loss=0.007471, over 15524.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09025, pruned_loss=0.01238, audio_tagging_loss=0.009144, over 3046577.60 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:52:37,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-26 00:52:38,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-26 00:52:54,902 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-26 00:53:03,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.571e+01 9.048e+01 1.003e+02 1.375e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 00:53:27,802 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2600, loss[loss=0.0678, simple_loss=0.0909, pruned_loss=0.01403, audio_tagging_loss=0.008323, over 15240.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09001, pruned_loss=0.01237, audio_tagging_loss=0.00902, over 3048134.46 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:53:33,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3143520.0, ans=0.2 2023-11-26 00:53:49,062 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-26 00:54:21,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3143853.3333333335, ans=0.0 2023-11-26 00:54:22,453 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2650, loss[loss=0.08772, simple_loss=0.1196, pruned_loss=0.02167, audio_tagging_loss=0.006267, over 17213.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0902, pruned_loss=0.01235, audio_tagging_loss=0.008888, over 3052230.42 frames. ], batch size: 65, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:54:23,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3143853.3333333335, ans=0.0 2023-11-26 00:54:26,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3143853.3333333335, ans=0.2 2023-11-26 00:54:44,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-26 00:54:46,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3143986.6666666665, ans=0.125 2023-11-26 00:54:49,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2023-11-26 00:54:54,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.622e+01 9.253e+01 1.030e+02 1.251e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:54:55,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2023-11-26 00:54:56,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3144053.3333333335, ans=0.04949747468305833 2023-11-26 00:55:10,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3144120.0, ans=0.0 2023-11-26 00:55:18,677 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2700, loss[loss=0.05997, simple_loss=0.07871, pruned_loss=0.01045, audio_tagging_loss=0.01017, over 14664.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08978, pruned_loss=0.01238, audio_tagging_loss=0.00892, over 3053616.62 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:55:18,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3144186.6666666665, ans=0.0 2023-11-26 00:55:22,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3144186.6666666665, ans=0.1 2023-11-26 00:55:32,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3144253.3333333335, ans=0.1 2023-11-26 00:55:41,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-26 00:56:15,106 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2750, loss[loss=0.03876, simple_loss=0.05228, pruned_loss=0.004436, audio_tagging_loss=0.008188, over 15375.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08938, pruned_loss=0.01238, audio_tagging_loss=0.00889, over 3049203.87 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:56:20,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2023-11-26 00:56:36,233 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-26 00:56:38,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-11-26 00:56:45,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.564e+01 9.385e+01 1.025e+02 1.216e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 00:57:02,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3144786.6666666665, ans=0.0 2023-11-26 00:57:03,953 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:57:10,306 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2800, loss[loss=0.05138, simple_loss=0.06449, pruned_loss=0.01187, audio_tagging_loss=0.007261, over 13888.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08968, pruned_loss=0.01242, audio_tagging_loss=0.008959, over 3049844.36 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:57:33,153 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-26 00:57:37,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3144986.6666666665, ans=0.0 2023-11-26 00:58:05,872 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2850, loss[loss=0.08587, simple_loss=0.122, pruned_loss=0.01569, audio_tagging_loss=0.009165, over 15023.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08986, pruned_loss=0.01242, audio_tagging_loss=0.008993, over 3049825.41 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:58:10,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3145186.6666666665, ans=0.1 2023-11-26 00:58:15,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3145186.6666666665, ans=0.125 2023-11-26 00:58:28,861 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-26 00:58:38,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 8.895e+01 9.329e+01 9.789e+01 1.221e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 00:58:45,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3145386.6666666665, ans=0.0 2023-11-26 00:59:02,281 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2900, loss[loss=0.07359, simple_loss=0.1035, pruned_loss=0.01584, audio_tagging_loss=0.006003, over 15458.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.0894, pruned_loss=0.01241, audio_tagging_loss=0.008975, over 3051374.12 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:59:18,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3145586.6666666665, ans=0.125 2023-11-26 00:59:21,354 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:59:24,315 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-26 00:59:24,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-26 00:59:24,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-26 00:59:36,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3145720.0, ans=0.125 2023-11-26 00:59:58,012 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2950, loss[loss=0.058, simple_loss=0.0789, pruned_loss=0.009683, audio_tagging_loss=0.008865, over 15446.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09025, pruned_loss=0.01249, audio_tagging_loss=0.008923, over 3042087.68 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:00:20,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-26 01:00:31,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.351e+01 9.999e+01 2.175e+02, threshold=1.870e+02, percent-clipped=2.0 2023-11-26 01:00:35,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-26 01:00:53,314 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3000, loss[loss=0.07497, simple_loss=0.1054, pruned_loss=0.01114, audio_tagging_loss=0.01113, over 15202.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09064, pruned_loss=0.01267, audio_tagging_loss=0.009002, over 3048686.51 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:00:53,315 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 01:01:25,505 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.05777, simple_loss=0.05069, pruned_loss=0.005189, audio_tagging_loss=0.02724, over 4681554.00 frames. 2023-11-26 01:01:25,506 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 01:01:35,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-26 01:01:42,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2023-11-26 01:01:46,808 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-26 01:02:13,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-26 01:02:20,663 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3050, loss[loss=0.07819, simple_loss=0.1008, pruned_loss=0.01737, audio_tagging_loss=0.01043, over 15766.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09116, pruned_loss=0.01274, audio_tagging_loss=0.009017, over 3053152.95 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:02:20,915 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:02:38,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3146586.6666666665, ans=0.2 2023-11-26 01:02:40,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2023-11-26 01:02:42,828 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-26 01:02:56,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.712e+01 9.411e+01 1.021e+02 1.458e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 01:02:56,988 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:03:06,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3146786.6666666665, ans=0.125 2023-11-26 01:03:08,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-26 01:03:10,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3146786.6666666665, ans=0.125 2023-11-26 01:03:18,242 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3100, loss[loss=0.05335, simple_loss=0.06769, pruned_loss=0.009734, audio_tagging_loss=0.009776, over 14962.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09004, pruned_loss=0.01259, audio_tagging_loss=0.009114, over 3057281.91 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:03:22,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-26 01:03:24,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3146853.3333333335, ans=0.125 2023-11-26 01:03:34,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3146920.0, ans=0.125 2023-11-26 01:03:41,114 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-26 01:03:43,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=22.5 2023-11-26 01:03:46,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3146986.6666666665, ans=0.0 2023-11-26 01:03:51,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-26 01:04:14,258 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3150, loss[loss=0.06137, simple_loss=0.08104, pruned_loss=0.008521, audio_tagging_loss=0.01233, over 15390.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.0908, pruned_loss=0.01274, audio_tagging_loss=0.009091, over 3046368.38 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:04:14,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3147186.6666666665, ans=0.2 2023-11-26 01:04:17,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-26 01:04:18,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-26 01:04:36,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-26 01:04:47,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.868e+01 9.358e+01 9.908e+01 1.230e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:04:53,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3147386.6666666665, ans=0.0 2023-11-26 01:05:07,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3147453.3333333335, ans=0.125 2023-11-26 01:05:09,989 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3200, loss[loss=0.06105, simple_loss=0.08065, pruned_loss=0.01204, audio_tagging_loss=0.008685, over 15113.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09046, pruned_loss=0.01269, audio_tagging_loss=0.009207, over 3055277.31 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:05:12,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3147520.0, ans=0.2 2023-11-26 01:05:17,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3147520.0, ans=0.125 2023-11-26 01:05:32,075 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-26 01:05:45,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-26 01:06:04,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3147853.3333333335, ans=0.125 2023-11-26 01:06:04,946 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3250, loss[loss=0.07512, simple_loss=0.09868, pruned_loss=0.01745, audio_tagging_loss=0.008331, over 15139.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09086, pruned_loss=0.01269, audio_tagging_loss=0.009211, over 3054026.84 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:06:05,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3147853.3333333335, ans=0.2 2023-11-26 01:06:07,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3147853.3333333335, ans=0.1 2023-11-26 01:06:10,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3147853.3333333335, ans=0.0 2023-11-26 01:06:22,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3147920.0, ans=0.125 2023-11-26 01:06:27,283 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-26 01:06:28,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3147986.6666666665, ans=0.125 2023-11-26 01:06:28,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3147986.6666666665, ans=0.0 2023-11-26 01:06:38,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.733e+01 9.362e+01 1.015e+02 1.651e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:06:59,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3148120.0, ans=0.2 2023-11-26 01:07:01,107 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3300, loss[loss=0.06496, simple_loss=0.09045, pruned_loss=0.01174, audio_tagging_loss=0.007992, over 15026.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09202, pruned_loss=0.01302, audio_tagging_loss=0.009169, over 3055976.16 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:07:01,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-11-26 01:07:08,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3148186.6666666665, ans=0.0 2023-11-26 01:07:23,471 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-26 01:07:27,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3148320.0, ans=0.2 2023-11-26 01:07:38,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3148386.6666666665, ans=0.1 2023-11-26 01:07:38,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3148386.6666666665, ans=0.125 2023-11-26 01:07:57,005 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3350, loss[loss=0.06162, simple_loss=0.08633, pruned_loss=0.009613, audio_tagging_loss=0.008839, over 14935.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09151, pruned_loss=0.01295, audio_tagging_loss=0.009136, over 3055863.85 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:08:05,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3148520.0, ans=0.0 2023-11-26 01:08:19,834 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-26 01:08:24,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3148653.3333333335, ans=6.0 2023-11-26 01:08:29,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2023-11-26 01:08:30,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148720.0, ans=0.1 2023-11-26 01:08:30,881 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.683e+01 9.249e+01 1.019e+02 1.203e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:08:31,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3148720.0, ans=0.125 2023-11-26 01:08:52,805 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3400, loss[loss=0.0644, simple_loss=0.09042, pruned_loss=0.01006, audio_tagging_loss=0.00913, over 15450.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09133, pruned_loss=0.01275, audio_tagging_loss=0.008999, over 3054623.72 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:08:55,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2023-11-26 01:08:59,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2023-11-26 01:09:06,873 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:09:08,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3148920.0, ans=0.0 2023-11-26 01:09:15,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-26 01:09:16,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-26 01:09:28,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-26 01:09:48,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3149186.6666666665, ans=0.125 2023-11-26 01:09:48,857 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3450, loss[loss=0.07432, simple_loss=0.106, pruned_loss=0.01145, audio_tagging_loss=0.009894, over 15425.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.0916, pruned_loss=0.01276, audio_tagging_loss=0.008969, over 3051042.46 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:09:56,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2023-11-26 01:10:02,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3149253.3333333335, ans=0.0 2023-11-26 01:10:07,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3149253.3333333335, ans=0.125 2023-11-26 01:10:11,417 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-26 01:10:16,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3149320.0, ans=0.0 2023-11-26 01:10:17,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-11-26 01:10:22,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.810e+01 9.451e+01 1.004e+02 1.366e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 01:10:24,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=22.5 2023-11-26 01:10:25,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3149386.6666666665, ans=0.1 2023-11-26 01:10:40,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3149453.3333333335, ans=0.035 2023-11-26 01:10:42,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3149453.3333333335, ans=0.125 2023-11-26 01:10:44,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3149520.0, ans=0.2 2023-11-26 01:10:45,054 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3500, loss[loss=0.06567, simple_loss=0.07853, pruned_loss=0.01479, audio_tagging_loss=0.01161, over 14648.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09035, pruned_loss=0.01265, audio_tagging_loss=0.00894, over 3049513.26 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:11:02,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3149586.6666666665, ans=0.04949747468305833 2023-11-26 01:11:08,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-26 01:11:15,500 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:11:33,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3149786.6666666665, ans=0.1 2023-11-26 01:11:36,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-11-26 01:11:40,875 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3550, loss[loss=0.05475, simple_loss=0.07483, pruned_loss=0.007752, audio_tagging_loss=0.00958, over 14525.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08988, pruned_loss=0.01261, audio_tagging_loss=0.008929, over 3046062.43 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:12:00,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3149920.0, ans=0.125 2023-11-26 01:12:04,005 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-26 01:12:05,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3149986.6666666665, ans=0.0 2023-11-26 01:12:14,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.583e+01 9.059e+01 9.596e+01 1.364e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 01:12:32,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3150120.0, ans=0.125 2023-11-26 01:12:37,536 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3600, loss[loss=0.07595, simple_loss=0.1027, pruned_loss=0.01546, audio_tagging_loss=0.009113, over 15393.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08976, pruned_loss=0.01257, audio_tagging_loss=0.008952, over 3049547.01 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:12:51,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3150253.3333333335, ans=0.125 2023-11-26 01:12:59,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-26 01:13:00,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3150320.0, ans=0.125 2023-11-26 01:13:17,290 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:13:33,441 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3650, loss[loss=0.05356, simple_loss=0.06694, pruned_loss=0.01004, audio_tagging_loss=0.01005, over 14714.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.0894, pruned_loss=0.01244, audio_tagging_loss=0.00888, over 3043102.94 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:13:41,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2023-11-26 01:13:45,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3150586.6666666665, ans=0.1 2023-11-26 01:13:55,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-26 01:13:57,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-26 01:14:08,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.763e+01 9.129e+01 9.774e+01 1.635e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 01:14:16,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2023-11-26 01:14:25,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3150786.6666666665, ans=0.1 2023-11-26 01:14:28,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.75 vs. limit=15.0 2023-11-26 01:14:28,703 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3700, loss[loss=0.04277, simple_loss=0.05146, pruned_loss=0.005585, audio_tagging_loss=0.01146, over 14374.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08963, pruned_loss=0.01245, audio_tagging_loss=0.008889, over 3043343.17 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:14:31,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3150853.3333333335, ans=0.0 2023-11-26 01:14:46,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3150920.0, ans=0.2 2023-11-26 01:14:52,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-26 01:14:57,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3150986.6666666665, ans=0.125 2023-11-26 01:15:11,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3151053.3333333335, ans=0.1 2023-11-26 01:15:12,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3151120.0, ans=0.125 2023-11-26 01:15:25,866 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3750, loss[loss=0.05679, simple_loss=0.07427, pruned_loss=0.009914, audio_tagging_loss=0.009742, over 15110.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.0902, pruned_loss=0.01258, audio_tagging_loss=0.008895, over 3051000.11 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:15:47,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-26 01:15:49,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3151320.0, ans=0.2 2023-11-26 01:15:59,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.900e+01 9.433e+01 1.035e+02 1.729e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 01:16:06,235 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:16:10,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-26 01:16:21,671 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3800, loss[loss=0.05212, simple_loss=0.07246, pruned_loss=0.005925, audio_tagging_loss=0.009967, over 15773.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09057, pruned_loss=0.01264, audio_tagging_loss=0.009015, over 3061142.31 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:16:29,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3151520.0, ans=0.1 2023-11-26 01:16:42,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3151653.3333333335, ans=10.0 2023-11-26 01:16:43,101 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-26 01:16:47,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3151653.3333333335, ans=0.125 2023-11-26 01:16:48,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3151653.3333333335, ans=0.125 2023-11-26 01:16:55,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3151720.0, ans=0.125 2023-11-26 01:17:09,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3151786.6666666665, ans=0.125 2023-11-26 01:17:16,308 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3850, loss[loss=0.05475, simple_loss=0.06259, pruned_loss=0.0126, audio_tagging_loss=0.01085, over 14500.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08972, pruned_loss=0.01251, audio_tagging_loss=0.009102, over 3056257.66 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:17:18,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-26 01:17:27,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3151920.0, ans=0.2 2023-11-26 01:17:34,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3151920.0, ans=0.035 2023-11-26 01:17:39,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-26 01:17:51,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.590e+01 9.252e+01 9.700e+01 1.619e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:17:52,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=22.5 2023-11-26 01:17:56,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=22.5 2023-11-26 01:17:57,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3152053.3333333335, ans=0.125 2023-11-26 01:18:02,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3152120.0, ans=0.0 2023-11-26 01:18:09,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.04 vs. limit=10.0 2023-11-26 01:18:12,591 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3900, loss[loss=0.05405, simple_loss=0.07842, pruned_loss=0.006495, audio_tagging_loss=0.00835, over 14468.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08948, pruned_loss=0.01248, audio_tagging_loss=0.009129, over 3050779.11 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:18:17,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3152186.6666666665, ans=0.0 2023-11-26 01:18:18,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3152186.6666666665, ans=0.09899494936611666 2023-11-26 01:18:19,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-26 01:18:34,803 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-26 01:18:55,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3152386.6666666665, ans=0.0 2023-11-26 01:19:07,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3152520.0, ans=0.2 2023-11-26 01:19:08,183 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3950, loss[loss=0.0638, simple_loss=0.09561, pruned_loss=0.007107, audio_tagging_loss=0.008892, over 15417.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08927, pruned_loss=0.0124, audio_tagging_loss=0.009187, over 3048588.67 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:19:09,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2023-11-26 01:19:18,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3152586.6666666665, ans=0.125 2023-11-26 01:19:20,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3152586.6666666665, ans=0.0 2023-11-26 01:19:23,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3152586.6666666665, ans=0.125 2023-11-26 01:19:29,437 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-26 01:19:39,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3152653.3333333335, ans=0.0 2023-11-26 01:19:42,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.671e+01 9.267e+01 9.996e+01 1.170e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 01:19:42,692 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:19:42,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3152720.0, ans=0.0 2023-11-26 01:20:03,229 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4000, loss[loss=0.06387, simple_loss=0.08614, pruned_loss=0.0102, audio_tagging_loss=0.0106, over 14034.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08998, pruned_loss=0.01252, audio_tagging_loss=0.009189, over 3044915.29 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:20:16,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3152920.0, ans=0.025 2023-11-26 01:20:23,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-26 01:20:26,157 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-26 01:20:41,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=15.0 2023-11-26 01:20:47,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153120.0, ans=0.1 2023-11-26 01:20:52,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3153120.0, ans=0.125 2023-11-26 01:20:58,609 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4050, loss[loss=0.06498, simple_loss=0.0947, pruned_loss=0.01141, audio_tagging_loss=0.006216, over 15226.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09023, pruned_loss=0.01247, audio_tagging_loss=0.009243, over 3047348.61 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:21:03,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153186.6666666665, ans=0.1 2023-11-26 01:21:03,973 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:21:08,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153186.6666666665, ans=0.1 2023-11-26 01:21:22,160 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-26 01:21:31,049 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:21:35,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.884e+01 9.464e+01 1.024e+02 1.208e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 01:21:49,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3153453.3333333335, ans=0.125 2023-11-26 01:21:55,904 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4100, loss[loss=0.0642, simple_loss=0.08358, pruned_loss=0.01465, audio_tagging_loss=0.007758, over 16568.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.0906, pruned_loss=0.01271, audio_tagging_loss=0.009217, over 3055423.54 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:21:59,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2023-11-26 01:22:04,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3153520.0, ans=0.125 2023-11-26 01:22:17,712 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-26 01:22:21,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3153653.3333333335, ans=6.0 2023-11-26 01:22:27,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3153653.3333333335, ans=0.05 2023-11-26 01:22:44,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3153786.6666666665, ans=0.2 2023-11-26 01:22:45,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153786.6666666665, ans=0.1 2023-11-26 01:22:51,521 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4150, loss[loss=0.06874, simple_loss=0.09416, pruned_loss=0.01302, audio_tagging_loss=0.008638, over 14803.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09054, pruned_loss=0.01266, audio_tagging_loss=0.009169, over 3050253.49 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:22:52,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-26 01:23:13,776 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-26 01:23:27,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.761e+01 9.353e+01 9.782e+01 1.109e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 01:23:29,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3154053.3333333335, ans=0.0 2023-11-26 01:23:30,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3154053.3333333335, ans=0.125 2023-11-26 01:23:31,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-11-26 01:23:32,746 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:23:33,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3154053.3333333335, ans=0.0 2023-11-26 01:23:35,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-26 01:23:46,544 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4200, loss[loss=0.08401, simple_loss=0.1141, pruned_loss=0.01831, audio_tagging_loss=0.008669, over 14889.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08946, pruned_loss=0.01255, audio_tagging_loss=0.009097, over 3055190.77 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:23:47,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3154186.6666666665, ans=0.125 2023-11-26 01:23:58,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-26 01:24:03,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3154253.3333333335, ans=0.125 2023-11-26 01:24:10,143 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-26 01:24:12,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3154320.0, ans=0.125 2023-11-26 01:24:20,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3154386.6666666665, ans=0.0 2023-11-26 01:24:24,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3154386.6666666665, ans=0.2 2023-11-26 01:24:42,951 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4250, loss[loss=0.07974, simple_loss=0.1193, pruned_loss=0.01522, audio_tagging_loss=0.004863, over 15713.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09101, pruned_loss=0.01262, audio_tagging_loss=0.008891, over 3054957.68 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:24:51,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-11-26 01:24:51,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3154520.0, ans=0.125 2023-11-26 01:25:01,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3154586.6666666665, ans=0.125 2023-11-26 01:25:01,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3154586.6666666665, ans=0.125 2023-11-26 01:25:05,337 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-26 01:25:19,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.719e+01 9.230e+01 1.020e+02 1.385e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:25:23,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.26 vs. limit=10.0 2023-11-26 01:25:28,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3154786.6666666665, ans=0.125 2023-11-26 01:25:39,093 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4300, loss[loss=0.06852, simple_loss=0.09989, pruned_loss=0.01218, audio_tagging_loss=0.006399, over 15311.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09067, pruned_loss=0.01241, audio_tagging_loss=0.008793, over 3056488.42 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:25:39,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2023-11-26 01:25:42,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-26 01:25:50,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=12.0 2023-11-26 01:26:01,425 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-26 01:26:03,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3154986.6666666665, ans=0.5 2023-11-26 01:26:04,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3154986.6666666665, ans=0.125 2023-11-26 01:26:19,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3155053.3333333335, ans=0.125 2023-11-26 01:26:33,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=22.5 2023-11-26 01:26:34,031 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4350, loss[loss=0.04647, simple_loss=0.05531, pruned_loss=0.0073, audio_tagging_loss=0.01152, over 14966.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09133, pruned_loss=0.01254, audio_tagging_loss=0.008731, over 3055942.25 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:26:56,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-26 01:26:59,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3155320.0, ans=0.0 2023-11-26 01:26:59,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3155320.0, ans=0.125 2023-11-26 01:27:09,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.594e+01 9.290e+01 1.001e+02 1.319e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 01:27:30,058 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4400, loss[loss=0.06199, simple_loss=0.08129, pruned_loss=0.01241, audio_tagging_loss=0.008937, over 15600.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09135, pruned_loss=0.0126, audio_tagging_loss=0.008776, over 3057729.38 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:27:43,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3155586.6666666665, ans=0.2 2023-11-26 01:27:52,536 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-26 01:27:53,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3155653.3333333335, ans=0.125 2023-11-26 01:27:58,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3155653.3333333335, ans=0.0 2023-11-26 01:28:17,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3155786.6666666665, ans=0.0 2023-11-26 01:28:20,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3155786.6666666665, ans=0.125 2023-11-26 01:28:26,503 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4450, loss[loss=0.07657, simple_loss=0.1038, pruned_loss=0.01724, audio_tagging_loss=0.007417, over 14764.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.0916, pruned_loss=0.01286, audio_tagging_loss=0.00872, over 3056523.46 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:28:26,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-11-26 01:28:36,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3155920.0, ans=0.0 2023-11-26 01:28:43,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3155920.0, ans=0.125 2023-11-26 01:28:44,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3155920.0, ans=0.0 2023-11-26 01:28:44,818 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:28:48,842 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-26 01:29:02,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.765e+01 9.296e+01 9.987e+01 1.152e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 01:29:04,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3156053.3333333335, ans=0.125 2023-11-26 01:29:13,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3156120.0, ans=0.125 2023-11-26 01:29:22,010 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4500, loss[loss=0.05976, simple_loss=0.08031, pruned_loss=0.01104, audio_tagging_loss=0.008569, over 14907.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09154, pruned_loss=0.01283, audio_tagging_loss=0.008703, over 3057525.90 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:29:39,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-26 01:29:44,840 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-26 01:29:50,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3156320.0, ans=0.0 2023-11-26 01:29:55,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3156386.6666666665, ans=0.0 2023-11-26 01:30:01,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3156386.6666666665, ans=0.1 2023-11-26 01:30:07,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3156453.3333333335, ans=0.1 2023-11-26 01:30:18,336 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4550, loss[loss=0.07548, simple_loss=0.1015, pruned_loss=0.01872, audio_tagging_loss=0.006035, over 15200.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09146, pruned_loss=0.01289, audio_tagging_loss=0.008706, over 3055195.63 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:30:18,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3156520.0, ans=0.1 2023-11-26 01:30:34,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3156586.6666666665, ans=0.125 2023-11-26 01:30:39,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3156653.3333333335, ans=0.1 2023-11-26 01:30:40,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-26 01:30:49,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3156653.3333333335, ans=0.125 2023-11-26 01:30:56,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.762e+01 9.232e+01 1.004e+02 1.439e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:31:02,004 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:31:05,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3156786.6666666665, ans=0.0 2023-11-26 01:31:14,167 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4600, loss[loss=0.04884, simple_loss=0.05733, pruned_loss=0.007037, audio_tagging_loss=0.01314, over 14898.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.0912, pruned_loss=0.01293, audio_tagging_loss=0.008794, over 3057977.68 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:31:30,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3156920.0, ans=0.1 2023-11-26 01:31:32,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.64 vs. limit=15.0 2023-11-26 01:31:34,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3156920.0, ans=0.125 2023-11-26 01:31:36,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-26 01:31:51,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3157053.3333333335, ans=0.125 2023-11-26 01:31:51,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3157053.3333333335, ans=0.2 2023-11-26 01:32:06,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-11-26 01:32:06,978 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:32:10,070 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4650, loss[loss=0.06573, simple_loss=0.07844, pruned_loss=0.01581, audio_tagging_loss=0.0107, over 14092.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09005, pruned_loss=0.01268, audio_tagging_loss=0.008892, over 3052653.31 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:32:22,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3157253.3333333335, ans=0.025 2023-11-26 01:32:25,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3157253.3333333335, ans=0.125 2023-11-26 01:32:32,761 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-26 01:32:47,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.767e+01 9.190e+01 1.011e+02 1.594e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 01:32:56,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3157453.3333333335, ans=0.2 2023-11-26 01:33:04,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3157453.3333333335, ans=0.125 2023-11-26 01:33:06,568 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4700, loss[loss=0.07656, simple_loss=0.1111, pruned_loss=0.01281, audio_tagging_loss=0.008193, over 15408.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09064, pruned_loss=0.01273, audio_tagging_loss=0.008974, over 3054129.19 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:33:10,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3157520.0, ans=0.125 2023-11-26 01:33:12,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3157520.0, ans=0.025 2023-11-26 01:33:15,956 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:33:22,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3157586.6666666665, ans=0.0 2023-11-26 01:33:28,372 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-26 01:33:44,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3157720.0, ans=0.125 2023-11-26 01:33:53,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3157786.6666666665, ans=0.2 2023-11-26 01:34:02,327 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4750, loss[loss=0.05044, simple_loss=0.06039, pruned_loss=0.009151, audio_tagging_loss=0.0111, over 15395.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09028, pruned_loss=0.0128, audio_tagging_loss=0.009076, over 3048302.72 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:34:07,786 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:34:12,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=22.5 2023-11-26 01:34:16,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3157920.0, ans=0.05 2023-11-26 01:34:17,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3157920.0, ans=0.125 2023-11-26 01:34:24,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-26 01:34:40,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.549e+01 9.197e+01 1.001e+02 1.331e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 01:34:46,284 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:34:52,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3158120.0, ans=0.125 2023-11-26 01:34:57,739 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4800, loss[loss=0.08675, simple_loss=0.123, pruned_loss=0.01778, audio_tagging_loss=0.007466, over 16559.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08981, pruned_loss=0.01265, audio_tagging_loss=0.009104, over 3046333.92 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:35:20,587 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-26 01:35:20,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3158320.0, ans=0.0 2023-11-26 01:35:29,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3158320.0, ans=0.05 2023-11-26 01:35:31,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3158386.6666666665, ans=0.125 2023-11-26 01:35:48,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3158453.3333333335, ans=0.0 2023-11-26 01:35:54,532 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4850, loss[loss=0.04426, simple_loss=0.05523, pruned_loss=0.005791, audio_tagging_loss=0.01085, over 14767.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08921, pruned_loss=0.01247, audio_tagging_loss=0.009258, over 3045656.62 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:36:02,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3158520.0, ans=0.125 2023-11-26 01:36:11,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158586.6666666665, ans=0.1 2023-11-26 01:36:16,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-26 01:36:30,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3158720.0, ans=0.125 2023-11-26 01:36:32,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.661e+01 9.165e+01 9.886e+01 1.284e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 01:36:44,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3158786.6666666665, ans=0.125 2023-11-26 01:36:49,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158853.3333333335, ans=0.1 2023-11-26 01:36:50,748 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4900, loss[loss=0.05808, simple_loss=0.07772, pruned_loss=0.01, audio_tagging_loss=0.009217, over 15024.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08933, pruned_loss=0.01258, audio_tagging_loss=0.009204, over 3044391.31 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:36:53,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3158853.3333333335, ans=0.125 2023-11-26 01:37:12,766 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-26 01:37:31,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2023-11-26 01:37:40,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-11-26 01:37:46,067 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4950, loss[loss=0.07792, simple_loss=0.1016, pruned_loss=0.01345, audio_tagging_loss=0.01368, over 15483.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.0893, pruned_loss=0.01252, audio_tagging_loss=0.009098, over 3042340.67 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:38:09,201 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-26 01:38:18,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3159320.0, ans=0.1 2023-11-26 01:38:24,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.576e+01 9.071e+01 1.006e+02 1.445e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 01:38:41,913 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5000, loss[loss=0.06507, simple_loss=0.09331, pruned_loss=0.01163, audio_tagging_loss=0.006788, over 16047.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08984, pruned_loss=0.01251, audio_tagging_loss=0.008952, over 3047043.51 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:39:01,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3159586.6666666665, ans=0.125 2023-11-26 01:39:04,610 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-26 01:39:13,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3159653.3333333335, ans=0.2 2023-11-26 01:39:14,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2023-11-26 01:39:18,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.36 vs. limit=5.0 2023-11-26 01:39:20,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3159720.0, ans=0.0 2023-11-26 01:39:23,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2023-11-26 01:39:38,320 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5050, loss[loss=0.03662, simple_loss=0.04508, pruned_loss=0.00411, audio_tagging_loss=0.009971, over 14390.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08888, pruned_loss=0.01239, audio_tagging_loss=0.008967, over 3044846.87 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:39:41,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3159853.3333333335, ans=0.2 2023-11-26 01:39:49,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3159920.0, ans=0.2 2023-11-26 01:39:56,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2023-11-26 01:39:59,849 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-26 01:40:12,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3160053.3333333335, ans=0.125 2023-11-26 01:40:12,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3160053.3333333335, ans=0.05 2023-11-26 01:40:14,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3160053.3333333335, ans=0.0 2023-11-26 01:40:16,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.830e+01 9.284e+01 9.877e+01 1.399e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 01:40:21,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3160053.3333333335, ans=0.2 2023-11-26 01:40:33,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3160186.6666666665, ans=0.0 2023-11-26 01:40:34,115 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5100, loss[loss=0.07902, simple_loss=0.1098, pruned_loss=0.01523, audio_tagging_loss=0.008881, over 16209.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08832, pruned_loss=0.01215, audio_tagging_loss=0.008986, over 3042558.50 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:40:35,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3160186.6666666665, ans=0.125 2023-11-26 01:40:54,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-11-26 01:40:55,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3160320.0, ans=0.0 2023-11-26 01:40:56,380 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-26 01:41:13,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3160386.6666666665, ans=0.2 2023-11-26 01:41:23,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-26 01:41:28,756 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5150, loss[loss=0.0687, simple_loss=0.09026, pruned_loss=0.0155, audio_tagging_loss=0.008067, over 15051.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08806, pruned_loss=0.01222, audio_tagging_loss=0.008992, over 3046087.51 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:41:35,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3160520.0, ans=0.2 2023-11-26 01:41:51,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-26 01:41:54,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2023-11-26 01:41:59,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3160653.3333333335, ans=0.125 2023-11-26 01:42:02,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3160720.0, ans=0.1 2023-11-26 01:42:07,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.876e+01 9.273e+01 9.906e+01 1.225e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 01:42:25,187 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5200, loss[loss=0.06237, simple_loss=0.08091, pruned_loss=0.01165, audio_tagging_loss=0.01026, over 14478.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08881, pruned_loss=0.01234, audio_tagging_loss=0.008959, over 3039642.80 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:42:34,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3160853.3333333335, ans=0.125 2023-11-26 01:42:39,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3160920.0, ans=0.0 2023-11-26 01:42:46,942 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-26 01:42:53,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-11-26 01:42:58,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161053.3333333335, ans=0.1 2023-11-26 01:43:05,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3161053.3333333335, ans=0.05 2023-11-26 01:43:13,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-11-26 01:43:13,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2023-11-26 01:43:20,743 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5250, loss[loss=0.07724, simple_loss=0.1042, pruned_loss=0.01649, audio_tagging_loss=0.008662, over 14693.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08936, pruned_loss=0.01241, audio_tagging_loss=0.008862, over 3046389.29 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:43:21,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3161186.6666666665, ans=0.125 2023-11-26 01:43:29,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3161186.6666666665, ans=0.125 2023-11-26 01:43:33,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3161253.3333333335, ans=0.125 2023-11-26 01:43:43,064 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-26 01:43:47,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3161320.0, ans=0.1 2023-11-26 01:43:53,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3161320.0, ans=0.125 2023-11-26 01:43:55,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3161386.6666666665, ans=0.0 2023-11-26 01:43:58,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3161386.6666666665, ans=0.125 2023-11-26 01:44:00,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.738e+01 9.374e+01 1.008e+02 2.043e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 01:44:01,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3161386.6666666665, ans=0.125 2023-11-26 01:44:10,095 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:44:11,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161453.3333333335, ans=0.1 2023-11-26 01:44:16,259 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5300, loss[loss=0.07687, simple_loss=0.1051, pruned_loss=0.0152, audio_tagging_loss=0.009105, over 14841.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08959, pruned_loss=0.01248, audio_tagging_loss=0.008863, over 3044574.53 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:44:24,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3161520.0, ans=0.125 2023-11-26 01:44:36,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3161586.6666666665, ans=0.2 2023-11-26 01:44:39,572 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-26 01:44:41,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161653.3333333335, ans=0.1 2023-11-26 01:44:41,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3161653.3333333335, ans=0.1 2023-11-26 01:44:45,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161653.3333333335, ans=0.1 2023-11-26 01:44:56,600 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:44:57,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2023-11-26 01:44:57,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3161720.0, ans=0.125 2023-11-26 01:45:00,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3161786.6666666665, ans=0.1 2023-11-26 01:45:06,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3161786.6666666665, ans=0.125 2023-11-26 01:45:09,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2023-11-26 01:45:12,042 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5350, loss[loss=0.06003, simple_loss=0.07879, pruned_loss=0.01186, audio_tagging_loss=0.008778, over 15003.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09001, pruned_loss=0.0125, audio_tagging_loss=0.008777, over 3044011.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:45:13,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3161853.3333333335, ans=0.125 2023-11-26 01:45:34,075 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-26 01:45:50,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.668e+01 9.369e+01 1.006e+02 1.281e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 01:45:53,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.55 vs. limit=6.0 2023-11-26 01:45:54,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3162053.3333333335, ans=0.1 2023-11-26 01:46:08,075 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5400, loss[loss=0.08128, simple_loss=0.1219, pruned_loss=0.01464, audio_tagging_loss=0.00567, over 15828.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08995, pruned_loss=0.0125, audio_tagging_loss=0.008909, over 3051382.52 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:46:19,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3162253.3333333335, ans=0.125 2023-11-26 01:46:29,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-26 01:46:32,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2023-11-26 01:47:02,407 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5450, loss[loss=0.07136, simple_loss=0.0894, pruned_loss=0.01956, audio_tagging_loss=0.00711, over 15650.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09018, pruned_loss=0.01257, audio_tagging_loss=0.008861, over 3051240.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:47:25,644 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-26 01:47:39,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3162720.0, ans=0.125 2023-11-26 01:47:41,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.837e+01 9.304e+01 1.039e+02 1.325e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 01:47:44,055 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:47:45,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3162720.0, ans=0.125 2023-11-26 01:47:58,526 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5500, loss[loss=0.05651, simple_loss=0.0751, pruned_loss=0.009704, audio_tagging_loss=0.009251, over 15993.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09058, pruned_loss=0.01269, audio_tagging_loss=0.008831, over 3050289.05 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:48:12,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=22.5 2023-11-26 01:48:19,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3162920.0, ans=0.125 2023-11-26 01:48:20,978 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-26 01:48:21,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3162986.6666666665, ans=0.2 2023-11-26 01:48:21,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3162986.6666666665, ans=0.0 2023-11-26 01:48:39,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-26 01:48:45,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3163120.0, ans=0.125 2023-11-26 01:48:54,817 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5550, loss[loss=0.07223, simple_loss=0.08754, pruned_loss=0.01869, audio_tagging_loss=0.009775, over 14694.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09078, pruned_loss=0.01288, audio_tagging_loss=0.008898, over 3044840.14 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:48:54,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3163186.6666666665, ans=0.0 2023-11-26 01:49:00,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-11-26 01:49:12,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3163253.3333333335, ans=0.0 2023-11-26 01:49:16,207 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-26 01:49:33,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3163386.6666666665, ans=0.0 2023-11-26 01:49:34,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.884e+01 9.348e+01 1.017e+02 1.220e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 01:49:49,979 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5600, loss[loss=0.05037, simple_loss=0.06182, pruned_loss=0.0089, audio_tagging_loss=0.01056, over 14974.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09046, pruned_loss=0.01291, audio_tagging_loss=0.009014, over 3049644.73 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:50:00,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3163586.6666666665, ans=0.2 2023-11-26 01:50:05,132 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:50:07,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3163586.6666666665, ans=0.125 2023-11-26 01:50:07,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3163586.6666666665, ans=0.0 2023-11-26 01:50:12,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-26 01:50:16,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-26 01:50:31,296 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:50:35,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3163786.6666666665, ans=0.1 2023-11-26 01:50:37,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-26 01:50:44,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3163853.3333333335, ans=0.0 2023-11-26 01:50:46,145 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5650, loss[loss=0.05919, simple_loss=0.07742, pruned_loss=0.01071, audio_tagging_loss=0.009771, over 15276.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09003, pruned_loss=0.0128, audio_tagging_loss=0.009067, over 3052524.32 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:51:09,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-26 01:51:12,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163986.6666666665, ans=0.1 2023-11-26 01:51:26,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.458e+01 9.048e+01 9.939e+01 1.364e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 01:51:27,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3164053.3333333335, ans=0.125 2023-11-26 01:51:42,947 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5700, loss[loss=0.04396, simple_loss=0.06094, pruned_loss=0.00464, audio_tagging_loss=0.008848, over 14568.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08918, pruned_loss=0.01251, audio_tagging_loss=0.00908, over 3049148.36 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:51:53,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.18 vs. limit=6.0 2023-11-26 01:52:04,733 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-26 01:52:10,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3164320.0, ans=0.125 2023-11-26 01:52:19,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3164386.6666666665, ans=0.125 2023-11-26 01:52:25,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3164386.6666666665, ans=0.0 2023-11-26 01:52:38,751 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5750, loss[loss=0.0668, simple_loss=0.08884, pruned_loss=0.01335, audio_tagging_loss=0.009027, over 14722.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08901, pruned_loss=0.01246, audio_tagging_loss=0.008958, over 3053303.65 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:52:52,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3164586.6666666665, ans=0.125 2023-11-26 01:52:56,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3164586.6666666665, ans=0.05 2023-11-26 01:53:01,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-26 01:53:01,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=22.5 2023-11-26 01:53:06,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3164653.3333333335, ans=0.125 2023-11-26 01:53:12,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3164720.0, ans=0.04949747468305833 2023-11-26 01:53:18,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2023-11-26 01:53:20,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.765e+01 9.381e+01 1.013e+02 1.424e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 01:53:33,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3164853.3333333335, ans=0.125 2023-11-26 01:53:34,545 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5800, loss[loss=0.06264, simple_loss=0.08863, pruned_loss=0.01011, audio_tagging_loss=0.008211, over 15282.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08896, pruned_loss=0.01241, audio_tagging_loss=0.008812, over 3053582.02 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:53:35,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3164853.3333333335, ans=0.2 2023-11-26 01:53:42,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3164853.3333333335, ans=0.125 2023-11-26 01:53:43,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3164853.3333333335, ans=0.125 2023-11-26 01:53:51,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3164920.0, ans=0.0 2023-11-26 01:53:57,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-26 01:53:57,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.74 vs. limit=15.0 2023-11-26 01:54:08,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3165053.3333333335, ans=0.025 2023-11-26 01:54:17,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-11-26 01:54:23,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3165120.0, ans=0.125 2023-11-26 01:54:30,600 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5850, loss[loss=0.09809, simple_loss=0.1421, pruned_loss=0.02125, audio_tagging_loss=0.005796, over 15331.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08959, pruned_loss=0.01259, audio_tagging_loss=0.008793, over 3048103.46 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:54:52,371 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-26 01:54:59,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3165320.0, ans=0.125 2023-11-26 01:55:02,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3165320.0, ans=0.0 2023-11-26 01:55:11,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.672e+01 9.332e+01 1.005e+02 2.095e+02, threshold=1.866e+02, percent-clipped=1.0 2023-11-26 01:55:15,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2023-11-26 01:55:19,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3165453.3333333335, ans=0.125 2023-11-26 01:55:26,150 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5900, loss[loss=0.06411, simple_loss=0.08787, pruned_loss=0.008081, audio_tagging_loss=0.01209, over 14793.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09022, pruned_loss=0.01263, audio_tagging_loss=0.008816, over 3047286.30 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:55:48,402 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-26 01:55:52,736 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:55:57,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3165653.3333333335, ans=0.125 2023-11-26 01:56:05,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-26 01:56:17,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3165786.6666666665, ans=0.125 2023-11-26 01:56:21,181 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5950, loss[loss=0.06945, simple_loss=0.09682, pruned_loss=0.01182, audio_tagging_loss=0.00922, over 16637.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0894, pruned_loss=0.01244, audio_tagging_loss=0.008805, over 3045910.33 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:56:32,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3165920.0, ans=0.1 2023-11-26 01:56:41,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2023-11-26 01:56:43,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3165986.6666666665, ans=0.0 2023-11-26 01:56:44,130 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-26 01:56:47,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=15.0 2023-11-26 01:56:59,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3166053.3333333335, ans=0.1 2023-11-26 01:56:59,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-26 01:57:02,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.504e+01 9.073e+01 9.680e+01 1.067e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 01:57:17,305 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6000, loss[loss=0.07848, simple_loss=0.1015, pruned_loss=0.01953, audio_tagging_loss=0.008211, over 16716.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08916, pruned_loss=0.0125, audio_tagging_loss=0.008828, over 3045687.97 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:57:17,306 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 01:57:32,239 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6241, 2.8762, 2.9381, 2.4673, 3.1821, 3.1127, 3.2151, 3.1230], device='cuda:1') 2023-11-26 01:57:36,378 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4618, 3.7608, 4.3982, 3.5965], device='cuda:1') 2023-11-26 01:57:41,750 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8215, 4.9569, 5.0857, 4.8704], device='cuda:1') 2023-11-26 01:57:49,491 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.0577, simple_loss=0.05067, pruned_loss=0.005162, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 01:57:49,492 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 01:57:52,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3166186.6666666665, ans=0.0 2023-11-26 01:58:05,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3166253.3333333335, ans=0.95 2023-11-26 01:58:12,981 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-26 01:58:17,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-26 01:58:17,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=22.5 2023-11-26 01:58:25,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2023-11-26 01:58:30,855 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:58:33,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3166453.3333333335, ans=0.0 2023-11-26 01:58:45,672 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6050, loss[loss=0.08389, simple_loss=0.1129, pruned_loss=0.0191, audio_tagging_loss=0.00833, over 17284.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0904, pruned_loss=0.01266, audio_tagging_loss=0.008693, over 3044419.28 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:58:48,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3166520.0, ans=0.125 2023-11-26 01:58:55,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3166520.0, ans=0.1 2023-11-26 01:59:08,353 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-26 01:59:08,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-26 01:59:21,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2023-11-26 01:59:27,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.819e+01 9.341e+01 9.960e+01 1.201e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 01:59:29,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3166786.6666666665, ans=0.2 2023-11-26 01:59:32,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3166786.6666666665, ans=0.0 2023-11-26 01:59:38,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3166786.6666666665, ans=0.0 2023-11-26 01:59:42,382 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6100, loss[loss=0.04938, simple_loss=0.06398, pruned_loss=0.008366, audio_tagging_loss=0.009023, over 14944.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09018, pruned_loss=0.01273, audio_tagging_loss=0.008741, over 3041914.05 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:59:44,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3166853.3333333335, ans=0.0 2023-11-26 01:59:47,879 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:59:51,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3166853.3333333335, ans=0.0 2023-11-26 01:59:59,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3166920.0, ans=0.125 2023-11-26 02:00:01,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3166920.0, ans=0.125 2023-11-26 02:00:04,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-26 02:00:15,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3167053.3333333335, ans=0.125 2023-11-26 02:00:19,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3167053.3333333335, ans=0.95 2023-11-26 02:00:37,605 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6150, loss[loss=0.06017, simple_loss=0.08578, pruned_loss=0.009996, audio_tagging_loss=0.007286, over 16179.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09106, pruned_loss=0.01273, audio_tagging_loss=0.008756, over 3037278.64 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:00:37,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3167186.6666666665, ans=0.125 2023-11-26 02:00:41,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=22.5 2023-11-26 02:00:49,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3167253.3333333335, ans=0.2 2023-11-26 02:00:55,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3167253.3333333335, ans=0.125 2023-11-26 02:00:58,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3167253.3333333335, ans=0.125 2023-11-26 02:01:00,322 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-26 02:01:05,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3167320.0, ans=0.0 2023-11-26 02:01:06,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3167320.0, ans=0.2 2023-11-26 02:01:18,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.792e+01 9.265e+01 9.786e+01 1.351e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:01:26,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3167453.3333333335, ans=0.09899494936611666 2023-11-26 02:01:33,675 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6200, loss[loss=0.06467, simple_loss=0.09991, pruned_loss=0.007157, audio_tagging_loss=0.007561, over 15312.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09116, pruned_loss=0.01278, audio_tagging_loss=0.008776, over 3040327.49 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:01:43,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-26 02:01:45,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3167586.6666666665, ans=0.125 2023-11-26 02:01:50,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3167586.6666666665, ans=0.125 2023-11-26 02:01:56,156 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-26 02:02:15,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3167720.0, ans=0.0 2023-11-26 02:02:18,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2023-11-26 02:02:30,246 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6250, loss[loss=0.07112, simple_loss=0.1007, pruned_loss=0.01181, audio_tagging_loss=0.00894, over 14723.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09086, pruned_loss=0.01272, audio_tagging_loss=0.00889, over 3041638.78 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:02:41,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3167920.0, ans=0.0 2023-11-26 02:02:51,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-26 02:02:53,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.48 vs. limit=6.0 2023-11-26 02:02:59,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2023-11-26 02:03:11,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.743e+01 9.437e+01 1.009e+02 1.277e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 02:03:25,458 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6300, loss[loss=0.09582, simple_loss=0.1469, pruned_loss=0.01835, audio_tagging_loss=0.004007, over 15847.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09035, pruned_loss=0.0126, audio_tagging_loss=0.008916, over 3043817.23 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:03:33,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3168186.6666666665, ans=0.0 2023-11-26 02:03:38,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3168253.3333333335, ans=0.2 2023-11-26 02:03:44,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3168253.3333333335, ans=0.0 2023-11-26 02:03:48,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-26 02:04:14,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3168453.3333333335, ans=0.125 2023-11-26 02:04:18,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3168453.3333333335, ans=0.125 2023-11-26 02:04:20,899 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6350, loss[loss=0.07134, simple_loss=0.1008, pruned_loss=0.0143, audio_tagging_loss=0.006636, over 14831.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08921, pruned_loss=0.01241, audio_tagging_loss=0.00907, over 3039646.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:04:44,219 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-26 02:04:47,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3168653.3333333335, ans=0.0 2023-11-26 02:05:02,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.674e+01 9.185e+01 1.006e+02 1.507e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 02:05:10,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3168786.6666666665, ans=0.0 2023-11-26 02:05:17,597 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6400, loss[loss=0.08524, simple_loss=0.1114, pruned_loss=0.01939, audio_tagging_loss=0.01015, over 15857.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08971, pruned_loss=0.01248, audio_tagging_loss=0.009231, over 3036320.00 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:05:38,914 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-26 02:05:39,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3168986.6666666665, ans=0.1 2023-11-26 02:05:49,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3169053.3333333335, ans=0.125 2023-11-26 02:05:50,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-26 02:05:57,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-11-26 02:06:12,563 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6450, loss[loss=0.07034, simple_loss=0.08212, pruned_loss=0.01706, audio_tagging_loss=0.01222, over 15429.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08914, pruned_loss=0.01239, audio_tagging_loss=0.00932, over 3039732.61 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:06:13,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3169186.6666666665, ans=0.04949747468305833 2023-11-26 02:06:33,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3169320.0, ans=0.0 2023-11-26 02:06:34,470 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-26 02:06:37,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-11-26 02:06:55,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.666e+01 9.241e+01 9.984e+01 1.381e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:07:08,102 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6500, loss[loss=0.07804, simple_loss=0.1104, pruned_loss=0.01457, audio_tagging_loss=0.008249, over 15804.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08918, pruned_loss=0.01244, audio_tagging_loss=0.009243, over 3043251.84 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:07:08,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3169520.0, ans=0.125 2023-11-26 02:07:17,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3169520.0, ans=0.125 2023-11-26 02:07:20,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3169586.6666666665, ans=0.0 2023-11-26 02:07:31,668 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-26 02:07:32,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-26 02:07:32,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.29 vs. limit=10.0 2023-11-26 02:07:33,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3169653.3333333335, ans=0.125 2023-11-26 02:07:41,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3169720.0, ans=0.125 2023-11-26 02:07:45,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.80 vs. limit=6.0 2023-11-26 02:07:58,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=15.0 2023-11-26 02:08:00,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=22.5 2023-11-26 02:08:04,751 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6550, loss[loss=0.08165, simple_loss=0.1134, pruned_loss=0.01554, audio_tagging_loss=0.009417, over 16019.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08935, pruned_loss=0.01256, audio_tagging_loss=0.009147, over 3044870.80 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:08:09,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3169853.3333333335, ans=10.0 2023-11-26 02:08:12,606 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:08:18,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3169920.0, ans=0.5 2023-11-26 02:08:27,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-26 02:08:37,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=15.0 2023-11-26 02:08:47,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.547e+01 9.134e+01 1.014e+02 1.239e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 02:08:55,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3170120.0, ans=0.0 2023-11-26 02:09:00,761 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6600, loss[loss=0.07371, simple_loss=0.102, pruned_loss=0.01499, audio_tagging_loss=0.0077, over 15771.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08871, pruned_loss=0.01242, audio_tagging_loss=0.009038, over 3046905.59 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:09:00,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3170186.6666666665, ans=0.07 2023-11-26 02:09:11,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2023-11-26 02:09:20,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3170253.3333333335, ans=0.125 2023-11-26 02:09:22,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-26 02:09:55,782 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6650, loss[loss=0.05636, simple_loss=0.07612, pruned_loss=0.01053, audio_tagging_loss=0.007763, over 14635.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08854, pruned_loss=0.0124, audio_tagging_loss=0.008956, over 3038772.31 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:10:08,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3170586.6666666665, ans=0.125 2023-11-26 02:10:19,358 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-26 02:10:30,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3170720.0, ans=0.07 2023-11-26 02:10:31,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-26 02:10:38,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.810e+01 9.306e+01 1.020e+02 1.538e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:10:43,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-26 02:10:47,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3170786.6666666665, ans=0.125 2023-11-26 02:10:52,577 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6700, loss[loss=0.05775, simple_loss=0.07881, pruned_loss=0.009918, audio_tagging_loss=0.008433, over 16871.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08888, pruned_loss=0.01244, audio_tagging_loss=0.00886, over 3041762.59 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:10:59,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3170853.3333333335, ans=0.0 2023-11-26 02:11:03,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3170920.0, ans=0.05 2023-11-26 02:11:14,716 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-26 02:11:28,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3171053.3333333335, ans=0.125 2023-11-26 02:11:48,604 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6750, loss[loss=0.07491, simple_loss=0.09984, pruned_loss=0.01492, audio_tagging_loss=0.01007, over 14344.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0885, pruned_loss=0.01234, audio_tagging_loss=0.00891, over 3040693.92 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:11:57,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-11-26 02:11:58,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3171253.3333333335, ans=0.125 2023-11-26 02:11:58,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2023-11-26 02:12:01,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3171253.3333333335, ans=0.125 2023-11-26 02:12:05,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2023-11-26 02:12:08,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-26 02:12:10,193 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-26 02:12:30,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.571e+01 9.016e+01 1.004e+02 1.567e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-26 02:12:43,785 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6800, loss[loss=0.05018, simple_loss=0.0685, pruned_loss=0.006972, audio_tagging_loss=0.008957, over 14637.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08983, pruned_loss=0.0125, audio_tagging_loss=0.008781, over 3040159.74 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:12:49,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3171520.0, ans=0.1 2023-11-26 02:13:06,603 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-26 02:13:12,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3171653.3333333335, ans=0.0 2023-11-26 02:13:37,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3171786.6666666665, ans=0.1 2023-11-26 02:13:39,501 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6850, loss[loss=0.05841, simple_loss=0.08464, pruned_loss=0.007316, audio_tagging_loss=0.008776, over 15692.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08948, pruned_loss=0.01239, audio_tagging_loss=0.008808, over 3042062.12 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:13:52,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3171920.0, ans=0.125 2023-11-26 02:13:54,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2023-11-26 02:13:59,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3171920.0, ans=0.125 2023-11-26 02:14:01,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3171986.6666666665, ans=0.0 2023-11-26 02:14:02,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-26 02:14:15,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2023-11-26 02:14:18,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3172053.3333333335, ans=0.0 2023-11-26 02:14:22,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.606e+01 9.440e+01 1.002e+02 1.257e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 02:14:24,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-11-26 02:14:27,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3172120.0, ans=0.125 2023-11-26 02:14:28,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3172120.0, ans=0.1 2023-11-26 02:14:35,826 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6900, loss[loss=0.06989, simple_loss=0.1021, pruned_loss=0.01245, audio_tagging_loss=0.00639, over 15883.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09013, pruned_loss=0.01235, audio_tagging_loss=0.008721, over 3043022.38 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:14:50,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3172253.3333333335, ans=0.125 2023-11-26 02:14:58,175 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-26 02:15:00,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3172320.0, ans=0.125 2023-11-26 02:15:04,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3172320.0, ans=0.125 2023-11-26 02:15:04,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3172320.0, ans=0.0 2023-11-26 02:15:14,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3172386.6666666665, ans=0.125 2023-11-26 02:15:19,168 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:15:23,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3172453.3333333335, ans=0.2 2023-11-26 02:15:31,309 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6950, loss[loss=0.06518, simple_loss=0.09013, pruned_loss=0.0129, audio_tagging_loss=0.007215, over 15407.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09006, pruned_loss=0.01243, audio_tagging_loss=0.00879, over 3045313.54 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:15:34,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3172520.0, ans=0.125 2023-11-26 02:15:38,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3172520.0, ans=0.125 2023-11-26 02:15:50,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-11-26 02:15:51,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3172586.6666666665, ans=0.0 2023-11-26 02:15:54,177 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-26 02:15:54,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3172653.3333333335, ans=0.1 2023-11-26 02:15:58,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3172653.3333333335, ans=0.0 2023-11-26 02:16:01,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3172653.3333333335, ans=0.0 2023-11-26 02:16:08,208 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:16:11,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3172720.0, ans=0.5 2023-11-26 02:16:11,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3172720.0, ans=0.125 2023-11-26 02:16:14,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.522e+01 9.109e+01 9.823e+01 1.262e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 02:16:26,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=12.0 2023-11-26 02:16:26,955 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7000, loss[loss=0.06584, simple_loss=0.09564, pruned_loss=0.01134, audio_tagging_loss=0.006678, over 14969.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09002, pruned_loss=0.01246, audio_tagging_loss=0.008849, over 3037687.45 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:16:27,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3172853.3333333335, ans=0.1 2023-11-26 02:16:42,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3172920.0, ans=0.125 2023-11-26 02:16:49,055 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-26 02:17:20,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2023-11-26 02:17:22,601 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7050, loss[loss=0.07627, simple_loss=0.1126, pruned_loss=0.01265, audio_tagging_loss=0.007319, over 16914.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08995, pruned_loss=0.01239, audio_tagging_loss=0.008852, over 3034731.46 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:17:28,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-26 02:17:36,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3173253.3333333335, ans=0.09899494936611666 2023-11-26 02:17:40,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3173253.3333333335, ans=0.1 2023-11-26 02:17:44,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-26 02:17:54,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3173320.0, ans=0.125 2023-11-26 02:18:08,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.411e+01 9.041e+01 9.968e+01 1.223e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-26 02:18:18,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-11-26 02:18:18,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3173520.0, ans=0.125 2023-11-26 02:18:19,682 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7100, loss[loss=0.07725, simple_loss=0.1121, pruned_loss=0.01384, audio_tagging_loss=0.007344, over 14492.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09054, pruned_loss=0.01246, audio_tagging_loss=0.008882, over 3040303.79 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:18:35,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3173586.6666666665, ans=0.125 2023-11-26 02:18:42,523 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-26 02:18:44,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3173653.3333333335, ans=0.125 2023-11-26 02:18:50,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3173653.3333333335, ans=0.125 2023-11-26 02:18:54,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3173720.0, ans=0.0 2023-11-26 02:18:58,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3173720.0, ans=0.125 2023-11-26 02:19:15,648 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7150, loss[loss=0.07147, simple_loss=0.09678, pruned_loss=0.01112, audio_tagging_loss=0.01196, over 15107.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09053, pruned_loss=0.01245, audio_tagging_loss=0.00904, over 3040285.66 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:19:21,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3173853.3333333335, ans=0.95 2023-11-26 02:19:27,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3173920.0, ans=0.0 2023-11-26 02:19:37,929 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-26 02:19:41,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3173986.6666666665, ans=0.1 2023-11-26 02:19:43,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3173986.6666666665, ans=0.125 2023-11-26 02:19:58,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.796e+01 9.304e+01 9.946e+01 1.523e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:20:11,675 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7200, loss[loss=0.05377, simple_loss=0.07647, pruned_loss=0.008976, audio_tagging_loss=0.006563, over 15444.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09142, pruned_loss=0.01263, audio_tagging_loss=0.009005, over 3055221.89 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:20:12,902 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:20:12,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3174186.6666666665, ans=0.0 2023-11-26 02:20:15,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3174186.6666666665, ans=0.125 2023-11-26 02:20:17,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3174186.6666666665, ans=0.125 2023-11-26 02:20:22,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3174253.3333333335, ans=0.0 2023-11-26 02:20:33,553 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-26 02:20:33,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3174320.0, ans=0.0 2023-11-26 02:20:37,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3174320.0, ans=0.2 2023-11-26 02:21:06,654 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7250, loss[loss=0.08469, simple_loss=0.1193, pruned_loss=0.01692, audio_tagging_loss=0.008135, over 15073.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09132, pruned_loss=0.01256, audio_tagging_loss=0.00899, over 3053465.95 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:21:06,962 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:21:25,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3174586.6666666665, ans=0.0 2023-11-26 02:21:29,710 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-26 02:21:51,693 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.421e+01 9.196e+01 9.750e+01 1.203e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 02:22:02,896 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7300, loss[loss=0.06892, simple_loss=0.09204, pruned_loss=0.01604, audio_tagging_loss=0.006863, over 15170.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09034, pruned_loss=0.01241, audio_tagging_loss=0.009007, over 3052595.88 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:22:03,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3174853.3333333335, ans=0.125 2023-11-26 02:22:18,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3174920.0, ans=0.015 2023-11-26 02:22:25,735 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-26 02:22:37,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175053.3333333335, ans=0.1 2023-11-26 02:22:51,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3175120.0, ans=0.0 2023-11-26 02:22:58,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-11-26 02:22:58,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-26 02:22:58,973 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7350, loss[loss=0.0655, simple_loss=0.09014, pruned_loss=0.01354, audio_tagging_loss=0.006895, over 14041.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09053, pruned_loss=0.01259, audio_tagging_loss=0.008845, over 3045056.19 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:23:06,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3175186.6666666665, ans=0.125 2023-11-26 02:23:09,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-26 02:23:11,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-26 02:23:20,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-26 02:23:44,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.628e+01 9.246e+01 9.778e+01 1.248e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:23:54,427 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7400, loss[loss=0.08376, simple_loss=0.1183, pruned_loss=0.01951, audio_tagging_loss=0.005114, over 14757.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09105, pruned_loss=0.01276, audio_tagging_loss=0.008801, over 3040175.21 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:24:01,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-26 02:24:06,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3175586.6666666665, ans=0.125 2023-11-26 02:24:06,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3175586.6666666665, ans=0.125 2023-11-26 02:24:16,198 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-26 02:24:21,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3175653.3333333335, ans=0.0 2023-11-26 02:24:21,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3175653.3333333335, ans=0.125 2023-11-26 02:24:29,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3175720.0, ans=0.1 2023-11-26 02:24:31,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3175720.0, ans=0.125 2023-11-26 02:24:49,165 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7450, loss[loss=0.06948, simple_loss=0.0989, pruned_loss=0.0136, audio_tagging_loss=0.006438, over 14400.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09079, pruned_loss=0.01275, audio_tagging_loss=0.008733, over 3033959.32 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:24:57,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3175853.3333333335, ans=0.07 2023-11-26 02:25:12,738 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-26 02:25:16,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3175986.6666666665, ans=0.1 2023-11-26 02:25:21,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3175986.6666666665, ans=0.1 2023-11-26 02:25:27,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-11-26 02:25:29,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3176053.3333333335, ans=0.2 2023-11-26 02:25:29,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3176053.3333333335, ans=0.1 2023-11-26 02:25:29,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3176053.3333333335, ans=0.0 2023-11-26 02:25:33,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3176120.0, ans=0.0 2023-11-26 02:25:35,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.103e+01 8.524e+01 9.234e+01 9.933e+01 1.379e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 02:25:46,037 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7500, loss[loss=0.06186, simple_loss=0.07632, pruned_loss=0.01341, audio_tagging_loss=0.01029, over 15206.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09063, pruned_loss=0.01284, audio_tagging_loss=0.008847, over 3038558.28 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:25:51,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3176186.6666666665, ans=0.125 2023-11-26 02:26:07,836 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-26 02:26:23,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3176386.6666666665, ans=0.125 2023-11-26 02:26:30,694 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:26:40,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3176520.0, ans=0.125 2023-11-26 02:26:41,534 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7550, loss[loss=0.08512, simple_loss=0.119, pruned_loss=0.01639, audio_tagging_loss=0.009256, over 15738.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09077, pruned_loss=0.01284, audio_tagging_loss=0.008829, over 3038061.80 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:27:03,132 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-26 02:27:06,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3176653.3333333335, ans=0.125 2023-11-26 02:27:14,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2023-11-26 02:27:18,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3176720.0, ans=0.1 2023-11-26 02:27:25,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3176786.6666666665, ans=0.0 2023-11-26 02:27:26,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 8.615e+01 8.990e+01 9.647e+01 1.278e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-26 02:27:29,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3176786.6666666665, ans=0.0 2023-11-26 02:27:29,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2023-11-26 02:27:31,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3176786.6666666665, ans=0.125 2023-11-26 02:27:36,401 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7600, loss[loss=0.0642, simple_loss=0.08518, pruned_loss=0.01167, audio_tagging_loss=0.00993, over 14770.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09084, pruned_loss=0.01274, audio_tagging_loss=0.00877, over 3043846.91 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:27:43,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3176853.3333333335, ans=0.125 2023-11-26 02:27:59,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-26 02:28:17,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3177053.3333333335, ans=0.0 2023-11-26 02:28:31,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-26 02:28:31,835 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7650, loss[loss=0.05832, simple_loss=0.07709, pruned_loss=0.01045, audio_tagging_loss=0.009324, over 14477.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09037, pruned_loss=0.01268, audio_tagging_loss=0.008766, over 3044515.38 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:28:54,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-26 02:29:12,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3177386.6666666665, ans=0.125 2023-11-26 02:29:18,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.582e+01 9.244e+01 1.012e+02 1.285e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:29:23,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3177453.3333333335, ans=0.0 2023-11-26 02:29:28,321 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7700, loss[loss=0.07127, simple_loss=0.09371, pruned_loss=0.01577, audio_tagging_loss=0.008642, over 15276.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.0908, pruned_loss=0.01285, audio_tagging_loss=0.008757, over 3044376.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:29:35,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3177520.0, ans=0.0 2023-11-26 02:29:41,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3177586.6666666665, ans=0.125 2023-11-26 02:29:45,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3177586.6666666665, ans=0.2 2023-11-26 02:29:50,120 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-26 02:29:50,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3177653.3333333335, ans=0.125 2023-11-26 02:29:59,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=12.0 2023-11-26 02:30:13,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-26 02:30:23,205 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7750, loss[loss=0.05435, simple_loss=0.07001, pruned_loss=0.008334, audio_tagging_loss=0.011, over 15788.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08992, pruned_loss=0.01259, audio_tagging_loss=0.008799, over 3042850.19 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:30:40,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3177920.0, ans=0.125 2023-11-26 02:30:45,812 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-26 02:31:07,835 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.684e+01 9.280e+01 1.001e+02 1.211e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 02:31:17,897 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7800, loss[loss=0.07023, simple_loss=0.08973, pruned_loss=0.01374, audio_tagging_loss=0.01162, over 14641.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09021, pruned_loss=0.01267, audio_tagging_loss=0.008769, over 3037868.78 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:31:19,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3178186.6666666665, ans=0.125 2023-11-26 02:31:33,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3178253.3333333335, ans=0.2 2023-11-26 02:31:38,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3178253.3333333335, ans=0.125 2023-11-26 02:31:40,630 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-26 02:31:51,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3178386.6666666665, ans=0.1 2023-11-26 02:31:58,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3178386.6666666665, ans=0.125 2023-11-26 02:31:59,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3178386.6666666665, ans=0.0 2023-11-26 02:32:14,227 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7850, loss[loss=0.06086, simple_loss=0.07509, pruned_loss=0.01077, audio_tagging_loss=0.01254, over 15576.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08958, pruned_loss=0.0127, audio_tagging_loss=0.00885, over 3032909.84 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:32:23,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3178586.6666666665, ans=0.125 2023-11-26 02:32:32,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3178586.6666666665, ans=0.125 2023-11-26 02:32:35,295 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-26 02:32:55,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3178720.0, ans=0.04949747468305833 2023-11-26 02:32:59,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.601e+01 9.556e+01 1.008e+02 1.371e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 02:33:01,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3178786.6666666665, ans=0.0 2023-11-26 02:33:09,237 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7900, loss[loss=0.05406, simple_loss=0.06803, pruned_loss=0.01069, audio_tagging_loss=0.009358, over 14727.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08952, pruned_loss=0.01272, audio_tagging_loss=0.008947, over 3035128.20 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:33:09,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3178853.3333333335, ans=0.125 2023-11-26 02:33:12,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-11-26 02:33:31,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-26 02:33:38,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3178986.6666666665, ans=0.025 2023-11-26 02:34:04,795 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7950, loss[loss=0.06823, simple_loss=0.08924, pruned_loss=0.01457, audio_tagging_loss=0.009044, over 15181.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08866, pruned_loss=0.01266, audio_tagging_loss=0.009107, over 3037307.35 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:34:17,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3179253.3333333335, ans=0.0 2023-11-26 02:34:17,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3179253.3333333335, ans=0.125 2023-11-26 02:34:19,491 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:34:27,419 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-26 02:34:28,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3179320.0, ans=0.2 2023-11-26 02:34:29,794 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:34:31,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3179320.0, ans=0.125 2023-11-26 02:34:40,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3179386.6666666665, ans=0.1 2023-11-26 02:34:43,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3179386.6666666665, ans=0.09899494936611666 2023-11-26 02:34:50,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.906e+01 9.549e+01 1.026e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 02:35:00,599 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8000, loss[loss=0.07449, simple_loss=0.1038, pruned_loss=0.01348, audio_tagging_loss=0.009107, over 16600.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08896, pruned_loss=0.01253, audio_tagging_loss=0.009116, over 3037517.00 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:35:02,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3179520.0, ans=0.125 2023-11-26 02:35:17,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.38 vs. limit=22.5 2023-11-26 02:35:20,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3179586.6666666665, ans=0.0 2023-11-26 02:35:22,365 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-26 02:35:23,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3179653.3333333335, ans=0.0 2023-11-26 02:35:34,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3179720.0, ans=0.1 2023-11-26 02:35:44,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3179786.6666666665, ans=0.125 2023-11-26 02:35:53,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3179786.6666666665, ans=0.09899494936611666 2023-11-26 02:35:56,008 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8050, loss[loss=0.06777, simple_loss=0.08586, pruned_loss=0.01546, audio_tagging_loss=0.009381, over 13885.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09, pruned_loss=0.01268, audio_tagging_loss=0.009057, over 3043835.18 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:36:03,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3179853.3333333335, ans=0.1 2023-11-26 02:36:03,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3179853.3333333335, ans=0.0 2023-11-26 02:36:14,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3179920.0, ans=0.0 2023-11-26 02:36:18,260 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-26 02:36:21,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3179986.6666666665, ans=0.025 2023-11-26 02:36:29,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-11-26 02:36:41,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.827e+01 9.529e+01 1.030e+02 1.385e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:36:51,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3180186.6666666665, ans=0.2 2023-11-26 02:36:51,849 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8100, loss[loss=0.04556, simple_loss=0.05795, pruned_loss=0.007264, audio_tagging_loss=0.009319, over 16377.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08979, pruned_loss=0.01261, audio_tagging_loss=0.009146, over 3043204.02 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:36:59,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-11-26 02:37:05,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3180253.3333333335, ans=0.125 2023-11-26 02:37:14,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-26 02:37:27,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2023-11-26 02:37:39,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2023-11-26 02:37:47,793 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8150, loss[loss=0.05826, simple_loss=0.07549, pruned_loss=0.00916, audio_tagging_loss=0.01136, over 16424.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09152, pruned_loss=0.01293, audio_tagging_loss=0.009019, over 3052162.01 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:37:50,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-26 02:37:56,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.09 vs. limit=10.0 2023-11-26 02:38:00,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2023-11-26 02:38:09,412 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-26 02:38:26,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3180720.0, ans=0.0 2023-11-26 02:38:33,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.566e+01 9.339e+01 1.007e+02 1.243e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 02:38:43,161 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8200, loss[loss=0.06338, simple_loss=0.08827, pruned_loss=0.0102, audio_tagging_loss=0.009051, over 16436.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09152, pruned_loss=0.01293, audio_tagging_loss=0.008879, over 3048624.35 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:38:44,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3180853.3333333335, ans=0.2 2023-11-26 02:38:45,254 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:39:04,743 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-26 02:39:14,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3180986.6666666665, ans=0.07 2023-11-26 02:39:21,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=15.0 2023-11-26 02:39:38,051 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8250, loss[loss=0.0558, simple_loss=0.07459, pruned_loss=0.00935, audio_tagging_loss=0.009153, over 15273.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08948, pruned_loss=0.01258, audio_tagging_loss=0.008835, over 3050460.44 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:39:54,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3181253.3333333335, ans=0.0 2023-11-26 02:39:54,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3181253.3333333335, ans=0.2 2023-11-26 02:40:00,632 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-26 02:40:19,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3181386.6666666665, ans=0.0 2023-11-26 02:40:23,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3181453.3333333335, ans=0.2 2023-11-26 02:40:23,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.407e+01 9.078e+01 9.588e+01 1.625e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 02:40:33,986 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8300, loss[loss=0.08378, simple_loss=0.1205, pruned_loss=0.0172, audio_tagging_loss=0.006352, over 15348.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08929, pruned_loss=0.01248, audio_tagging_loss=0.008835, over 3049162.60 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:40:46,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 02:40:56,322 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-26 02:41:00,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3181653.3333333335, ans=0.125 2023-11-26 02:41:29,727 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8350, loss[loss=0.05831, simple_loss=0.06718, pruned_loss=0.01464, audio_tagging_loss=0.01008, over 14766.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08914, pruned_loss=0.01253, audio_tagging_loss=0.008916, over 3047251.79 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:41:31,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3181853.3333333335, ans=0.2 2023-11-26 02:41:31,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3181853.3333333335, ans=0.125 2023-11-26 02:41:52,057 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-26 02:42:16,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.717e+01 9.531e+01 1.032e+02 1.340e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:42:21,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3182120.0, ans=0.2 2023-11-26 02:42:25,229 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8400, loss[loss=0.0723, simple_loss=0.096, pruned_loss=0.01497, audio_tagging_loss=0.009325, over 15294.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08882, pruned_loss=0.01242, audio_tagging_loss=0.008972, over 3051332.90 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:42:25,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-11-26 02:42:27,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3182186.6666666665, ans=0.125 2023-11-26 02:42:33,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3182186.6666666665, ans=0.5 2023-11-26 02:42:39,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3182253.3333333335, ans=0.0 2023-11-26 02:42:47,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-26 02:42:49,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3182320.0, ans=0.0 2023-11-26 02:43:12,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3182453.3333333335, ans=0.125 2023-11-26 02:43:20,969 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8450, loss[loss=0.05006, simple_loss=0.06197, pruned_loss=0.009595, audio_tagging_loss=0.009486, over 14227.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08912, pruned_loss=0.01238, audio_tagging_loss=0.008962, over 3048637.52 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:43:35,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3182586.6666666665, ans=0.125 2023-11-26 02:43:42,535 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-26 02:43:58,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3182720.0, ans=0.05 2023-11-26 02:44:09,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.762e+01 9.219e+01 9.767e+01 1.234e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 02:44:16,651 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8500, loss[loss=0.06099, simple_loss=0.08098, pruned_loss=0.01135, audio_tagging_loss=0.009151, over 15698.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09023, pruned_loss=0.01247, audio_tagging_loss=0.008918, over 3051235.07 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:44:26,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-11-26 02:44:39,001 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-26 02:44:43,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3182986.6666666665, ans=0.1 2023-11-26 02:44:44,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3182986.6666666665, ans=0.1 2023-11-26 02:44:44,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3182986.6666666665, ans=0.125 2023-11-26 02:44:56,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3183053.3333333335, ans=0.125 2023-11-26 02:45:01,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3183120.0, ans=0.2 2023-11-26 02:45:06,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3183120.0, ans=10.0 2023-11-26 02:45:11,607 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8550, loss[loss=0.07424, simple_loss=0.1045, pruned_loss=0.01489, audio_tagging_loss=0.0071, over 14836.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09054, pruned_loss=0.01253, audio_tagging_loss=0.008838, over 3051615.76 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:45:29,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3183253.3333333335, ans=0.0 2023-11-26 02:45:34,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-26 02:45:48,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3183386.6666666665, ans=0.2 2023-11-26 02:45:59,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.847e+01 9.604e+01 1.040e+02 3.358e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-26 02:46:03,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3183453.3333333335, ans=0.125 2023-11-26 02:46:07,456 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8600, loss[loss=0.05934, simple_loss=0.08634, pruned_loss=0.007772, audio_tagging_loss=0.008402, over 15624.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09041, pruned_loss=0.01263, audio_tagging_loss=0.008844, over 3044961.09 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:46:29,590 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-26 02:46:40,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-26 02:46:46,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183720.0, ans=0.1 2023-11-26 02:46:50,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-26 02:47:02,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3183853.3333333335, ans=0.0 2023-11-26 02:47:03,346 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8650, loss[loss=0.04598, simple_loss=0.05542, pruned_loss=0.008674, audio_tagging_loss=0.009602, over 13734.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09024, pruned_loss=0.0126, audio_tagging_loss=0.008894, over 3042720.79 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:47:20,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3183920.0, ans=0.125 2023-11-26 02:47:23,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3183986.6666666665, ans=0.125 2023-11-26 02:47:24,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183986.6666666665, ans=0.1 2023-11-26 02:47:24,860 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-26 02:47:24,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3183986.6666666665, ans=0.0 2023-11-26 02:47:25,039 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:47:26,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3183986.6666666665, ans=0.125 2023-11-26 02:47:28,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3183986.6666666665, ans=0.125 2023-11-26 02:47:50,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.714e+01 9.384e+01 1.008e+02 1.495e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 02:47:57,879 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8700, loss[loss=0.0596, simple_loss=0.07663, pruned_loss=0.01233, audio_tagging_loss=0.008955, over 15679.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09032, pruned_loss=0.01263, audio_tagging_loss=0.00898, over 3047106.13 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:48:00,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3184186.6666666665, ans=0.125 2023-11-26 02:48:18,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3184253.3333333335, ans=0.125 2023-11-26 02:48:21,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-26 02:48:28,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-26 02:48:41,525 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:48:45,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-26 02:48:45,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3184453.3333333335, ans=0.04949747468305833 2023-11-26 02:48:53,989 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8750, loss[loss=0.07152, simple_loss=0.09631, pruned_loss=0.01541, audio_tagging_loss=0.007957, over 15610.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09057, pruned_loss=0.01278, audio_tagging_loss=0.009068, over 3036173.80 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:49:01,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3184520.0, ans=0.0 2023-11-26 02:49:15,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3184653.3333333335, ans=0.1 2023-11-26 02:49:16,043 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-26 02:49:19,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3184653.3333333335, ans=0.125 2023-11-26 02:49:32,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3184720.0, ans=0.0 2023-11-26 02:49:38,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-11-26 02:49:41,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.883e+01 9.536e+01 1.029e+02 1.726e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 02:49:49,893 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8800, loss[loss=0.05994, simple_loss=0.08346, pruned_loss=0.0101, audio_tagging_loss=0.008113, over 15916.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09065, pruned_loss=0.01275, audio_tagging_loss=0.009132, over 3043709.95 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:49:52,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3184853.3333333335, ans=0.07 2023-11-26 02:49:59,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3184920.0, ans=0.0 2023-11-26 02:50:11,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-26 02:50:15,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3184986.6666666665, ans=0.125 2023-11-26 02:50:36,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.74 vs. limit=10.0 2023-11-26 02:50:44,758 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8850, loss[loss=0.07244, simple_loss=0.09714, pruned_loss=0.01162, audio_tagging_loss=0.01225, over 16149.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09096, pruned_loss=0.01284, audio_tagging_loss=0.009107, over 3045139.54 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:50:56,862 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:51:06,924 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-26 02:51:14,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3185320.0, ans=0.125 2023-11-26 02:51:16,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3185320.0, ans=0.0 2023-11-26 02:51:24,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-26 02:51:25,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3185386.6666666665, ans=0.0 2023-11-26 02:51:31,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.669e+01 9.316e+01 9.941e+01 1.332e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 02:51:39,826 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8900, loss[loss=0.05819, simple_loss=0.07833, pruned_loss=0.01094, audio_tagging_loss=0.008083, over 15315.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09121, pruned_loss=0.01273, audio_tagging_loss=0.008886, over 3047555.04 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:52:02,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-26 02:52:36,334 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8950, loss[loss=0.08541, simple_loss=0.1151, pruned_loss=0.01953, audio_tagging_loss=0.00835, over 16599.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09011, pruned_loss=0.0125, audio_tagging_loss=0.008866, over 3053716.73 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:52:38,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3185853.3333333335, ans=0.0 2023-11-26 02:52:39,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3185853.3333333335, ans=0.125 2023-11-26 02:52:47,054 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:52:47,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3185920.0, ans=0.0 2023-11-26 02:52:57,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-26 02:52:57,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3185986.6666666665, ans=0.0 2023-11-26 02:53:23,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.837e+01 9.264e+01 1.015e+02 1.330e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:53:31,286 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9000, loss[loss=0.06965, simple_loss=0.09058, pruned_loss=0.01583, audio_tagging_loss=0.008527, over 14369.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08944, pruned_loss=0.01236, audio_tagging_loss=0.008804, over 3047902.86 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:53:31,287 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 02:54:03,238 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.05846, simple_loss=0.05059, pruned_loss=0.005121, audio_tagging_loss=0.02804, over 4681554.00 frames. 2023-11-26 02:54:03,238 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 02:54:25,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-26 02:54:54,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3186453.3333333335, ans=0.2 2023-11-26 02:54:59,520 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9050, loss[loss=0.07407, simple_loss=0.0956, pruned_loss=0.01784, audio_tagging_loss=0.008426, over 15165.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08915, pruned_loss=0.01225, audio_tagging_loss=0.00882, over 3049445.29 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:55:08,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=22.5 2023-11-26 02:55:20,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-26 02:55:21,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3186653.3333333335, ans=0.04949747468305833 2023-11-26 02:55:23,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3186653.3333333335, ans=0.1 2023-11-26 02:55:48,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.738e+01 9.239e+01 1.027e+02 1.338e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:55:54,936 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9100, loss[loss=0.07416, simple_loss=0.1121, pruned_loss=0.01243, audio_tagging_loss=0.005679, over 15159.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08953, pruned_loss=0.01222, audio_tagging_loss=0.008746, over 3059136.57 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:55:55,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-26 02:56:17,378 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-26 02:56:46,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3187120.0, ans=0.2 2023-11-26 02:56:50,665 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9150, loss[loss=0.05853, simple_loss=0.07893, pruned_loss=0.01188, audio_tagging_loss=0.007186, over 16305.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08991, pruned_loss=0.01247, audio_tagging_loss=0.008705, over 3055909.00 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 02:56:51,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3187186.6666666665, ans=0.125 2023-11-26 02:57:12,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3187320.0, ans=0.0 2023-11-26 02:57:13,555 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-26 02:57:39,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.630e+01 9.008e+01 9.650e+01 1.509e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-26 02:57:46,744 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9200, loss[loss=0.07303, simple_loss=0.104, pruned_loss=0.01624, audio_tagging_loss=0.004775, over 15378.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09062, pruned_loss=0.01261, audio_tagging_loss=0.008671, over 3053225.84 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:57:49,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3187520.0, ans=0.0 2023-11-26 02:57:50,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2023-11-26 02:57:51,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3187520.0, ans=0.0 2023-11-26 02:57:58,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2023-11-26 02:58:08,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-26 02:58:17,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3187653.3333333335, ans=0.0 2023-11-26 02:58:29,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3187720.0, ans=0.0 2023-11-26 02:58:30,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3187786.6666666665, ans=0.0 2023-11-26 02:58:34,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2023-11-26 02:58:40,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3187786.6666666665, ans=0.2 2023-11-26 02:58:42,627 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9250, loss[loss=0.05945, simple_loss=0.08124, pruned_loss=0.01091, audio_tagging_loss=0.007922, over 14887.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09003, pruned_loss=0.01245, audio_tagging_loss=0.008768, over 3056374.07 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:59:01,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187920.0, ans=0.1 2023-11-26 02:59:03,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=22.5 2023-11-26 02:59:04,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-26 02:59:06,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3187986.6666666665, ans=0.0 2023-11-26 02:59:12,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3187986.6666666665, ans=0.125 2023-11-26 02:59:31,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.730e+01 8.687e+01 9.467e+01 1.008e+02 1.287e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 02:59:38,523 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9300, loss[loss=0.05745, simple_loss=0.08244, pruned_loss=0.009528, audio_tagging_loss=0.006708, over 15132.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09015, pruned_loss=0.01249, audio_tagging_loss=0.00873, over 3054382.43 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:59:42,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3188186.6666666665, ans=0.0 2023-11-26 02:59:48,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3188253.3333333335, ans=0.2 2023-11-26 02:59:55,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3188253.3333333335, ans=0.0 2023-11-26 03:00:00,861 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-26 03:00:01,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-26 03:00:15,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3188386.6666666665, ans=0.0 2023-11-26 03:00:20,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3188386.6666666665, ans=0.125 2023-11-26 03:00:25,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-26 03:00:34,957 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9350, loss[loss=0.06127, simple_loss=0.07815, pruned_loss=0.01273, audio_tagging_loss=0.009461, over 15328.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09035, pruned_loss=0.01248, audio_tagging_loss=0.008773, over 3055796.69 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:00:40,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2023-11-26 03:00:43,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3188520.0, ans=0.125 2023-11-26 03:00:44,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2023-11-26 03:00:56,913 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-26 03:01:22,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3188786.6666666665, ans=0.0 2023-11-26 03:01:23,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.528e+01 9.262e+01 1.004e+02 1.387e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 03:01:30,309 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9400, loss[loss=0.06413, simple_loss=0.08727, pruned_loss=0.01155, audio_tagging_loss=0.008946, over 16288.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09076, pruned_loss=0.01243, audio_tagging_loss=0.008868, over 3060403.37 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:01:38,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=22.5 2023-11-26 03:01:52,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-26 03:01:55,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3188986.6666666665, ans=0.0 2023-11-26 03:02:00,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-26 03:02:25,001 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:02:26,073 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9450, loss[loss=0.07935, simple_loss=0.1138, pruned_loss=0.01346, audio_tagging_loss=0.00899, over 15368.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09029, pruned_loss=0.01238, audio_tagging_loss=0.008982, over 3060312.51 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:02:26,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189186.6666666665, ans=0.1 2023-11-26 03:02:49,013 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-26 03:02:50,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3189320.0, ans=0.035 2023-11-26 03:02:55,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189320.0, ans=0.1 2023-11-26 03:03:02,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3189386.6666666665, ans=0.0 2023-11-26 03:03:07,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3189386.6666666665, ans=0.125 2023-11-26 03:03:16,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.757e+01 9.306e+01 1.009e+02 1.204e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:03:22,606 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9500, loss[loss=0.0508, simple_loss=0.06802, pruned_loss=0.008091, audio_tagging_loss=0.008699, over 16499.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09063, pruned_loss=0.01254, audio_tagging_loss=0.009045, over 3053872.06 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:03:45,023 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-26 03:03:51,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3189653.3333333335, ans=0.0 2023-11-26 03:04:18,269 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9550, loss[loss=0.06923, simple_loss=0.09875, pruned_loss=0.01286, audio_tagging_loss=0.006996, over 14374.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09065, pruned_loss=0.01265, audio_tagging_loss=0.009076, over 3055726.46 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:04:19,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189853.3333333335, ans=0.1 2023-11-26 03:04:20,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3189853.3333333335, ans=0.125 2023-11-26 03:04:25,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3189853.3333333335, ans=0.5 2023-11-26 03:04:28,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3189920.0, ans=0.0 2023-11-26 03:04:40,865 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-26 03:04:58,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3190053.3333333335, ans=0.0 2023-11-26 03:05:07,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3190120.0, ans=0.125 2023-11-26 03:05:08,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.706e+01 9.165e+01 9.979e+01 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 03:05:11,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3190120.0, ans=0.2 2023-11-26 03:05:14,053 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9600, loss[loss=0.05916, simple_loss=0.07571, pruned_loss=0.009761, audio_tagging_loss=0.01154, over 14788.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09063, pruned_loss=0.0126, audio_tagging_loss=0.009108, over 3058498.76 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:05:37,114 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-26 03:05:49,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=22.5 2023-11-26 03:06:10,228 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9650, loss[loss=0.06202, simple_loss=0.08843, pruned_loss=0.01112, audio_tagging_loss=0.006675, over 15937.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.0913, pruned_loss=0.01275, audio_tagging_loss=0.009007, over 3054658.04 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:06:13,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3190520.0, ans=0.0 2023-11-26 03:06:18,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-26 03:06:19,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3190520.0, ans=0.1 2023-11-26 03:06:27,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3190586.6666666665, ans=0.0 2023-11-26 03:06:31,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-26 03:06:54,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3190786.6666666665, ans=0.125 2023-11-26 03:07:00,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.580e+01 9.247e+01 1.001e+02 1.210e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 03:07:01,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2023-11-26 03:07:06,035 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9700, loss[loss=0.07406, simple_loss=0.1073, pruned_loss=0.01247, audio_tagging_loss=0.007948, over 15977.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09103, pruned_loss=0.01283, audio_tagging_loss=0.008877, over 3049450.51 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:07:06,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3190853.3333333335, ans=0.125 2023-11-26 03:07:09,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190853.3333333335, ans=0.1 2023-11-26 03:07:28,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-26 03:08:01,015 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9750, loss[loss=0.06213, simple_loss=0.08003, pruned_loss=0.01301, audio_tagging_loss=0.009114, over 15676.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09021, pruned_loss=0.01273, audio_tagging_loss=0.008826, over 3042204.65 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:08:12,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=22.5 2023-11-26 03:08:22,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-26 03:08:24,560 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-26 03:08:26,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191320.0, ans=0.1 2023-11-26 03:08:34,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3191386.6666666665, ans=0.125 2023-11-26 03:08:52,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.589e+01 9.136e+01 9.841e+01 1.317e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:08:55,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3191453.3333333335, ans=0.125 2023-11-26 03:08:57,335 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9800, loss[loss=0.07655, simple_loss=0.1002, pruned_loss=0.01423, audio_tagging_loss=0.01222, over 15437.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08978, pruned_loss=0.01269, audio_tagging_loss=0.008802, over 3043591.38 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:08:59,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3191520.0, ans=0.125 2023-11-26 03:09:01,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3191520.0, ans=0.125 2023-11-26 03:09:10,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3191586.6666666665, ans=0.1 2023-11-26 03:09:15,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3191586.6666666665, ans=0.0 2023-11-26 03:09:19,736 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-26 03:09:48,444 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:09:50,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3191786.6666666665, ans=0.0 2023-11-26 03:09:53,711 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9850, loss[loss=0.06421, simple_loss=0.08899, pruned_loss=0.01188, audio_tagging_loss=0.007839, over 16325.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09024, pruned_loss=0.0127, audio_tagging_loss=0.008741, over 3047042.49 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:10:15,531 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-26 03:10:20,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3191986.6666666665, ans=0.125 2023-11-26 03:10:44,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 8.656e+01 9.156e+01 9.841e+01 1.408e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 03:10:48,795 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9900, loss[loss=0.07224, simple_loss=0.1035, pruned_loss=0.01156, audio_tagging_loss=0.008914, over 15381.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09055, pruned_loss=0.01281, audio_tagging_loss=0.008665, over 3048181.38 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:10:52,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-26 03:10:56,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3192186.6666666665, ans=0.0 2023-11-26 03:11:11,605 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-26 03:11:11,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3192320.0, ans=0.125 2023-11-26 03:11:20,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3192320.0, ans=0.125 2023-11-26 03:11:44,259 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9950, loss[loss=0.05776, simple_loss=0.07823, pruned_loss=0.009561, audio_tagging_loss=0.009083, over 15983.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09074, pruned_loss=0.01275, audio_tagging_loss=0.008736, over 3046631.59 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:11:51,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3192520.0, ans=0.0 2023-11-26 03:11:54,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3192520.0, ans=0.0 2023-11-26 03:11:59,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-26 03:12:01,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3192586.6666666665, ans=0.0 2023-11-26 03:12:04,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3192586.6666666665, ans=0.125 2023-11-26 03:12:06,741 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-26 03:12:07,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-26 03:12:08,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2023-11-26 03:12:30,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3192786.6666666665, ans=0.125 2023-11-26 03:12:36,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.464e+01 9.280e+01 9.880e+01 1.249e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 03:12:40,794 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10000, loss[loss=0.07103, simple_loss=0.1081, pruned_loss=0.01081, audio_tagging_loss=0.0062, over 16310.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08999, pruned_loss=0.01255, audio_tagging_loss=0.008736, over 3044537.44 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:12:41,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3192853.3333333335, ans=0.2 2023-11-26 03:12:44,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3192853.3333333335, ans=0.125 2023-11-26 03:12:52,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=12.0 2023-11-26 03:12:55,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.71 vs. limit=22.5 2023-11-26 03:13:01,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-26 03:13:02,300 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-26 03:13:02,444 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:13:04,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3192986.6666666665, ans=0.125 2023-11-26 03:13:09,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-11-26 03:13:10,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3192986.6666666665, ans=0.95 2023-11-26 03:13:15,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3193053.3333333335, ans=22.5 2023-11-26 03:13:22,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-26 03:13:25,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-26 03:13:36,315 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10050, loss[loss=0.06229, simple_loss=0.08824, pruned_loss=0.009343, audio_tagging_loss=0.008829, over 16114.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09037, pruned_loss=0.01257, audio_tagging_loss=0.008709, over 3049569.77 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:13:37,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-26 03:13:40,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3193186.6666666665, ans=0.0 2023-11-26 03:13:58,675 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-26 03:13:59,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2023-11-26 03:14:02,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3193320.0, ans=0.2 2023-11-26 03:14:26,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.389e+01 9.159e+01 9.802e+01 1.976e+02, threshold=1.832e+02, percent-clipped=1.0 2023-11-26 03:14:31,503 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10100, loss[loss=0.06315, simple_loss=0.08984, pruned_loss=0.009468, audio_tagging_loss=0.008765, over 15391.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09056, pruned_loss=0.01248, audio_tagging_loss=0.00871, over 3054816.26 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:14:53,768 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-26 03:15:16,532 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:15:16,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3193786.6666666665, ans=0.0 2023-11-26 03:15:27,575 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10150, loss[loss=0.08693, simple_loss=0.1242, pruned_loss=0.01678, audio_tagging_loss=0.008035, over 15427.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09048, pruned_loss=0.01257, audio_tagging_loss=0.00883, over 3055076.26 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:15:36,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3193853.3333333335, ans=0.125 2023-11-26 03:15:36,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3193853.3333333335, ans=0.1 2023-11-26 03:15:49,039 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-26 03:15:53,165 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:15:54,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3193986.6666666665, ans=0.125 2023-11-26 03:16:11,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-26 03:16:19,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.867e+01 9.466e+01 1.017e+02 1.236e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 03:16:20,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-26 03:16:22,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3194186.6666666665, ans=0.07 2023-11-26 03:16:22,876 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10200, loss[loss=0.0832, simple_loss=0.1165, pruned_loss=0.01746, audio_tagging_loss=0.007513, over 14567.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0907, pruned_loss=0.0126, audio_tagging_loss=0.008864, over 3051184.67 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:16:40,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3194253.3333333335, ans=0.1 2023-11-26 03:16:45,237 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:16:45,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-26 03:16:47,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3194320.0, ans=0.0 2023-11-26 03:16:53,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-26 03:17:11,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2023-11-26 03:17:17,601 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10250, loss[loss=0.07039, simple_loss=0.09375, pruned_loss=0.01533, audio_tagging_loss=0.008184, over 14670.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09055, pruned_loss=0.01251, audio_tagging_loss=0.008934, over 3059304.25 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:17:19,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-26 03:17:34,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2023-11-26 03:17:38,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3194586.6666666665, ans=0.0 2023-11-26 03:17:41,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-26 03:18:10,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.528e+01 9.326e+01 1.007e+02 1.324e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 03:18:11,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3194786.6666666665, ans=0.025 2023-11-26 03:18:14,617 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10300, loss[loss=0.0627, simple_loss=0.07788, pruned_loss=0.01335, audio_tagging_loss=0.01042, over 15656.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0911, pruned_loss=0.01262, audio_tagging_loss=0.00896, over 3064130.30 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:18:19,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3194853.3333333335, ans=0.0 2023-11-26 03:18:29,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3194920.0, ans=0.0 2023-11-26 03:18:32,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3194920.0, ans=0.125 2023-11-26 03:18:36,433 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-26 03:18:46,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-11-26 03:18:46,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3195053.3333333335, ans=0.1 2023-11-26 03:18:52,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3195053.3333333335, ans=0.1 2023-11-26 03:18:56,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3195053.3333333335, ans=0.0 2023-11-26 03:19:09,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3195186.6666666665, ans=0.1 2023-11-26 03:19:10,598 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10350, loss[loss=0.07844, simple_loss=0.1068, pruned_loss=0.01611, audio_tagging_loss=0.008946, over 15317.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09177, pruned_loss=0.01281, audio_tagging_loss=0.00903, over 3061159.85 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:19:10,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3195186.6666666665, ans=0.0 2023-11-26 03:19:12,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-11-26 03:19:22,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3195253.3333333335, ans=0.125 2023-11-26 03:19:32,395 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-26 03:19:33,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3195320.0, ans=0.0 2023-11-26 03:19:35,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2023-11-26 03:19:43,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2023-11-26 03:19:44,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-11-26 03:19:51,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3195386.6666666665, ans=0.0 2023-11-26 03:20:03,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.860e+01 9.371e+01 1.044e+02 1.411e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 03:20:06,037 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10400, loss[loss=0.0551, simple_loss=0.07336, pruned_loss=0.007945, audio_tagging_loss=0.01047, over 15096.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09094, pruned_loss=0.01274, audio_tagging_loss=0.00909, over 3057524.19 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:20:24,873 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:20:26,017 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:20:29,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-26 03:20:53,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-26 03:21:00,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3195786.6666666665, ans=0.0 2023-11-26 03:21:02,554 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10450, loss[loss=0.05059, simple_loss=0.06219, pruned_loss=0.008791, audio_tagging_loss=0.01071, over 15724.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09114, pruned_loss=0.01274, audio_tagging_loss=0.009015, over 3059098.20 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:21:13,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-11-26 03:21:15,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3195920.0, ans=0.125 2023-11-26 03:21:24,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=15.0 2023-11-26 03:21:25,157 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-26 03:21:38,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-26 03:21:41,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3196053.3333333335, ans=0.0 2023-11-26 03:21:56,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.740e+01 9.314e+01 1.011e+02 1.304e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 03:21:58,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-26 03:21:59,372 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10500, loss[loss=0.06979, simple_loss=0.09386, pruned_loss=0.01362, audio_tagging_loss=0.009238, over 15212.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09069, pruned_loss=0.01247, audio_tagging_loss=0.008933, over 3055471.39 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:22:09,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3196253.3333333335, ans=0.0 2023-11-26 03:22:13,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-11-26 03:22:21,254 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-26 03:22:21,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-11-26 03:22:25,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3196320.0, ans=0.1 2023-11-26 03:22:25,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3196320.0, ans=0.125 2023-11-26 03:22:32,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-26 03:22:39,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-26 03:22:40,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-26 03:22:43,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3196453.3333333335, ans=0.125 2023-11-26 03:22:48,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3196453.3333333335, ans=0.125 2023-11-26 03:22:54,784 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10550, loss[loss=0.0657, simple_loss=0.08071, pruned_loss=0.01211, audio_tagging_loss=0.01324, over 15458.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09007, pruned_loss=0.01234, audio_tagging_loss=0.00883, over 3051752.50 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:23:05,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3196586.6666666665, ans=0.2 2023-11-26 03:23:05,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3196586.6666666665, ans=0.0 2023-11-26 03:23:07,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3196586.6666666665, ans=0.125 2023-11-26 03:23:17,846 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-26 03:23:27,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3196653.3333333335, ans=0.125 2023-11-26 03:23:36,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3196720.0, ans=0.0 2023-11-26 03:23:48,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.656e+01 9.441e+01 1.040e+02 1.486e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 03:23:50,640 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10600, loss[loss=0.08762, simple_loss=0.1184, pruned_loss=0.01985, audio_tagging_loss=0.00855, over 15123.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09006, pruned_loss=0.01233, audio_tagging_loss=0.00872, over 3054729.05 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:23:54,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2023-11-26 03:23:59,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3196853.3333333335, ans=0.2 2023-11-26 03:24:02,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3196920.0, ans=0.125 2023-11-26 03:24:13,349 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-26 03:24:14,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196986.6666666665, ans=0.1 2023-11-26 03:24:14,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3196986.6666666665, ans=0.5 2023-11-26 03:24:25,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3197053.3333333335, ans=0.0 2023-11-26 03:24:46,463 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10650, loss[loss=0.07892, simple_loss=0.1041, pruned_loss=0.01841, audio_tagging_loss=0.008472, over 15344.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.0903, pruned_loss=0.01248, audio_tagging_loss=0.008617, over 3052792.97 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:24:46,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3197186.6666666665, ans=0.2 2023-11-26 03:25:08,816 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-26 03:25:10,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3197320.0, ans=0.125 2023-11-26 03:25:14,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-26 03:25:14,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3197320.0, ans=0.1 2023-11-26 03:25:17,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3197320.0, ans=0.0 2023-11-26 03:25:21,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-26 03:25:22,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-26 03:25:31,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3197453.3333333335, ans=0.5 2023-11-26 03:25:41,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.674e+01 8.472e+01 9.133e+01 9.975e+01 1.277e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:25:42,591 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10700, loss[loss=0.07212, simple_loss=0.09904, pruned_loss=0.01562, audio_tagging_loss=0.006979, over 14551.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09044, pruned_loss=0.01251, audio_tagging_loss=0.008649, over 3053297.95 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:25:56,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3197586.6666666665, ans=0.025 2023-11-26 03:25:58,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3197586.6666666665, ans=0.125 2023-11-26 03:26:05,524 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-26 03:26:07,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197653.3333333335, ans=0.1 2023-11-26 03:26:13,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-26 03:26:14,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197653.3333333335, ans=0.1 2023-11-26 03:26:15,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3197720.0, ans=0.1 2023-11-26 03:26:28,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-26 03:26:35,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3197786.6666666665, ans=0.2 2023-11-26 03:26:38,386 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10750, loss[loss=0.081, simple_loss=0.1048, pruned_loss=0.01997, audio_tagging_loss=0.00864, over 14156.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09074, pruned_loss=0.01265, audio_tagging_loss=0.008624, over 3051616.64 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:26:51,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3197920.0, ans=0.1 2023-11-26 03:26:55,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3197920.0, ans=0.0 2023-11-26 03:27:00,730 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-26 03:27:08,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3197986.6666666665, ans=0.125 2023-11-26 03:27:25,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-26 03:27:26,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3198120.0, ans=0.125 2023-11-26 03:27:33,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.525e+01 9.180e+01 9.934e+01 1.273e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 03:27:34,668 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10800, loss[loss=0.06578, simple_loss=0.0919, pruned_loss=0.01099, audio_tagging_loss=0.008836, over 15458.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09019, pruned_loss=0.01262, audio_tagging_loss=0.008606, over 3048816.34 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:27:42,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3198186.6666666665, ans=0.125 2023-11-26 03:27:43,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198186.6666666665, ans=0.1 2023-11-26 03:27:56,901 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-26 03:28:14,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3198386.6666666665, ans=0.125 2023-11-26 03:28:20,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3198453.3333333335, ans=0.125 2023-11-26 03:28:21,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.59 vs. limit=10.0 2023-11-26 03:28:29,926 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10850, loss[loss=0.06831, simple_loss=0.09571, pruned_loss=0.008959, audio_tagging_loss=0.0115, over 15126.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08973, pruned_loss=0.01244, audio_tagging_loss=0.008648, over 3052019.72 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:28:33,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=22.5 2023-11-26 03:28:45,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3198586.6666666665, ans=0.0 2023-11-26 03:28:53,356 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-26 03:28:59,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:29:09,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2023-11-26 03:29:21,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3198786.6666666665, ans=0.1 2023-11-26 03:29:23,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-26 03:29:24,291 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:29:25,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.656e+01 9.276e+01 9.852e+01 1.367e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:29:25,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3198853.3333333335, ans=0.1 2023-11-26 03:29:26,357 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10900, loss[loss=0.0619, simple_loss=0.07996, pruned_loss=0.01026, audio_tagging_loss=0.01166, over 15307.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09027, pruned_loss=0.01271, audio_tagging_loss=0.008665, over 3049713.12 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:29:48,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-26 03:29:57,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3198986.6666666665, ans=0.09899494936611666 2023-11-26 03:30:01,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-26 03:30:10,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3199120.0, ans=0.04949747468305833 2023-11-26 03:30:11,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3199120.0, ans=0.125 2023-11-26 03:30:22,731 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10950, loss[loss=0.06756, simple_loss=0.09267, pruned_loss=0.01331, audio_tagging_loss=0.007913, over 15179.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08964, pruned_loss=0.01255, audio_tagging_loss=0.008748, over 3054903.38 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:30:27,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3199186.6666666665, ans=0.09899494936611666 2023-11-26 03:30:44,440 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-26 03:30:57,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3199386.6666666665, ans=0.2 2023-11-26 03:31:00,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199386.6666666665, ans=0.1 2023-11-26 03:31:06,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3199453.3333333335, ans=0.05 2023-11-26 03:31:16,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.536e+01 9.275e+01 9.846e+01 1.244e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:31:17,921 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11000, loss[loss=0.06171, simple_loss=0.07699, pruned_loss=0.01238, audio_tagging_loss=0.01084, over 13768.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.0903, pruned_loss=0.01259, audio_tagging_loss=0.00875, over 3057150.66 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:31:18,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3199520.0, ans=0.125 2023-11-26 03:31:22,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3199520.0, ans=0.125 2023-11-26 03:31:26,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3199520.0, ans=0.125 2023-11-26 03:31:28,494 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:31:40,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-26 03:31:46,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3199653.3333333335, ans=0.0 2023-11-26 03:32:01,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3199786.6666666665, ans=0.125 2023-11-26 03:32:14,136 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11050, loss[loss=0.04997, simple_loss=0.06682, pruned_loss=0.006653, audio_tagging_loss=0.009911, over 14655.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08985, pruned_loss=0.01249, audio_tagging_loss=0.008811, over 3056033.10 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:32:24,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2023-11-26 03:32:28,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3199920.0, ans=0.1 2023-11-26 03:32:31,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-11-26 03:32:36,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-26 03:32:45,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3199986.6666666665, ans=0.125 2023-11-26 03:32:47,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3199986.6666666665, ans=0.125 2023-11-26 03:32:55,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3200053.3333333335, ans=0.2 2023-11-26 03:33:10,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3200120.0, ans=0.125 2023-11-26 03:33:11,099 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.630e+01 9.436e+01 1.031e+02 1.953e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-26 03:33:12,217 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11100, loss[loss=0.058, simple_loss=0.08203, pruned_loss=0.008869, audio_tagging_loss=0.008113, over 14935.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.0908, pruned_loss=0.01261, audio_tagging_loss=0.008869, over 3051615.68 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:33:29,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-11-26 03:33:33,444 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-26 03:33:53,666 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:34:07,203 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11150, loss[loss=0.06891, simple_loss=0.09998, pruned_loss=0.01121, audio_tagging_loss=0.007702, over 15368.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09035, pruned_loss=0.01237, audio_tagging_loss=0.009006, over 3047620.45 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:34:12,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3200520.0, ans=0.125 2023-11-26 03:34:16,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3200586.6666666665, ans=0.2 2023-11-26 03:34:25,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.95 vs. limit=6.0 2023-11-26 03:34:27,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3200586.6666666665, ans=0.0 2023-11-26 03:34:28,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3200653.3333333335, ans=0.0 2023-11-26 03:34:28,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2023-11-26 03:34:29,454 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-26 03:34:52,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3200786.6666666665, ans=0.125 2023-11-26 03:34:53,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3200786.6666666665, ans=0.125 2023-11-26 03:35:00,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.918e+01 9.641e+01 1.031e+02 1.753e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 03:35:02,653 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11200, loss[loss=0.06615, simple_loss=0.09033, pruned_loss=0.0123, audio_tagging_loss=0.008692, over 15946.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08991, pruned_loss=0.01233, audio_tagging_loss=0.009129, over 3047561.32 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:35:23,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3200920.0, ans=0.0 2023-11-26 03:35:25,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-26 03:35:35,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3201053.3333333335, ans=0.0 2023-11-26 03:35:56,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2023-11-26 03:35:57,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3201186.6666666665, ans=0.125 2023-11-26 03:35:59,476 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11250, loss[loss=0.0612, simple_loss=0.08236, pruned_loss=0.01188, audio_tagging_loss=0.008145, over 15408.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08943, pruned_loss=0.01236, audio_tagging_loss=0.009181, over 3042144.82 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:36:06,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3201186.6666666665, ans=0.04949747468305833 2023-11-26 03:36:06,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3201186.6666666665, ans=0.125 2023-11-26 03:36:06,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3201186.6666666665, ans=0.125 2023-11-26 03:36:10,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3201253.3333333335, ans=0.2 2023-11-26 03:36:20,650 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-26 03:36:29,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-11-26 03:36:47,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3201453.3333333335, ans=0.0 2023-11-26 03:36:53,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.665e+01 9.306e+01 1.002e+02 1.136e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:36:54,685 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11300, loss[loss=0.07993, simple_loss=0.117, pruned_loss=0.01219, audio_tagging_loss=0.009235, over 15104.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08955, pruned_loss=0.01247, audio_tagging_loss=0.009058, over 3040710.78 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:36:58,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3201520.0, ans=0.125 2023-11-26 03:37:03,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3201520.0, ans=0.2 2023-11-26 03:37:03,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3201520.0, ans=0.2 2023-11-26 03:37:10,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3201586.6666666665, ans=0.0 2023-11-26 03:37:13,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3201586.6666666665, ans=0.0 2023-11-26 03:37:14,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3201586.6666666665, ans=0.2 2023-11-26 03:37:16,555 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-26 03:37:18,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3201653.3333333335, ans=0.2 2023-11-26 03:37:20,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3201653.3333333335, ans=0.125 2023-11-26 03:37:23,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3201653.3333333335, ans=0.2 2023-11-26 03:37:24,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3201653.3333333335, ans=0.2 2023-11-26 03:37:26,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3201653.3333333335, ans=0.1 2023-11-26 03:37:35,254 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:37:49,992 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11350, loss[loss=0.07558, simple_loss=0.1021, pruned_loss=0.0141, audio_tagging_loss=0.01043, over 14733.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08998, pruned_loss=0.01244, audio_tagging_loss=0.008929, over 3045449.75 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:37:55,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3201853.3333333335, ans=0.1 2023-11-26 03:37:55,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3201853.3333333335, ans=0.04949747468305833 2023-11-26 03:37:55,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.19 vs. limit=10.0 2023-11-26 03:37:59,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3201853.3333333335, ans=0.0 2023-11-26 03:38:00,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3201920.0, ans=0.1 2023-11-26 03:38:01,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.46 vs. limit=22.5 2023-11-26 03:38:13,024 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-26 03:38:23,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3202053.3333333335, ans=0.125 2023-11-26 03:38:24,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2023-11-26 03:38:26,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3202053.3333333335, ans=10.0 2023-11-26 03:38:31,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3202053.3333333335, ans=0.0 2023-11-26 03:38:44,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.638e+01 9.308e+01 1.022e+02 1.333e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:38:45,432 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11400, loss[loss=0.06474, simple_loss=0.08678, pruned_loss=0.01277, audio_tagging_loss=0.008575, over 15787.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08965, pruned_loss=0.01239, audio_tagging_loss=0.008938, over 3041140.01 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:39:07,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-26 03:39:41,783 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11450, loss[loss=0.07477, simple_loss=0.1105, pruned_loss=0.0133, audio_tagging_loss=0.006199, over 14973.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08921, pruned_loss=0.01238, audio_tagging_loss=0.008923, over 3036563.61 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:39:49,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3202520.0, ans=0.05 2023-11-26 03:39:51,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3202586.6666666665, ans=0.0 2023-11-26 03:40:03,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-26 03:40:06,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3202653.3333333335, ans=0.2 2023-11-26 03:40:29,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3202786.6666666665, ans=0.0 2023-11-26 03:40:33,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3202786.6666666665, ans=0.1 2023-11-26 03:40:37,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.830e+01 9.338e+01 1.004e+02 1.564e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 03:40:37,384 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11500, loss[loss=0.0645, simple_loss=0.09825, pruned_loss=0.009323, audio_tagging_loss=0.00605, over 15622.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08931, pruned_loss=0.01241, audio_tagging_loss=0.008866, over 3036756.55 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:40:46,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3202853.3333333335, ans=0.2 2023-11-26 03:40:50,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3202920.0, ans=0.2 2023-11-26 03:41:00,696 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-26 03:41:10,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3203053.3333333335, ans=0.1 2023-11-26 03:41:22,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3203120.0, ans=0.1 2023-11-26 03:41:33,096 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11550, loss[loss=0.06342, simple_loss=0.08299, pruned_loss=0.01219, audio_tagging_loss=0.009726, over 15550.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08994, pruned_loss=0.01251, audio_tagging_loss=0.008846, over 3041027.80 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:41:34,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3203186.6666666665, ans=0.1 2023-11-26 03:41:39,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3203186.6666666665, ans=0.0 2023-11-26 03:41:54,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3203253.3333333335, ans=0.125 2023-11-26 03:41:55,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3203320.0, ans=0.125 2023-11-26 03:41:55,971 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-26 03:41:58,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3203320.0, ans=0.0 2023-11-26 03:42:02,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3203320.0, ans=0.015 2023-11-26 03:42:02,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3203320.0, ans=0.0 2023-11-26 03:42:02,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3203320.0, ans=0.025 2023-11-26 03:42:09,072 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:42:10,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3203386.6666666665, ans=0.5 2023-11-26 03:42:29,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.932e+01 9.599e+01 1.033e+02 1.724e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 03:42:29,068 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11600, loss[loss=0.05101, simple_loss=0.07073, pruned_loss=0.006478, audio_tagging_loss=0.009171, over 15273.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.0909, pruned_loss=0.01272, audio_tagging_loss=0.00875, over 3054574.66 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:42:46,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3203586.6666666665, ans=0.125 2023-11-26 03:42:50,951 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-26 03:42:59,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3203653.3333333335, ans=0.1 2023-11-26 03:43:04,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3203720.0, ans=0.05 2023-11-26 03:43:08,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3203720.0, ans=0.125 2023-11-26 03:43:13,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-11-26 03:43:24,221 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11650, loss[loss=0.05671, simple_loss=0.07589, pruned_loss=0.01173, audio_tagging_loss=0.007039, over 15344.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09117, pruned_loss=0.01267, audio_tagging_loss=0.008765, over 3053818.78 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:43:24,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3203853.3333333335, ans=0.125 2023-11-26 03:43:32,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3203853.3333333335, ans=0.125 2023-11-26 03:43:37,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3203920.0, ans=0.0 2023-11-26 03:43:42,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-26 03:43:46,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-26 03:44:13,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-26 03:44:19,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.387e+01 9.006e+01 9.801e+01 1.650e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-26 03:44:19,978 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11700, loss[loss=0.06329, simple_loss=0.08558, pruned_loss=0.01051, audio_tagging_loss=0.00999, over 15784.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09023, pruned_loss=0.01249, audio_tagging_loss=0.008873, over 3043305.75 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:44:22,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3204186.6666666665, ans=0.125 2023-11-26 03:44:29,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3204186.6666666665, ans=0.125 2023-11-26 03:44:29,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3204186.6666666665, ans=0.125 2023-11-26 03:44:36,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-26 03:44:41,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=22.5 2023-11-26 03:44:42,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2023-11-26 03:44:42,855 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-26 03:44:45,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3204320.0, ans=0.05 2023-11-26 03:44:48,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-11-26 03:44:51,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3204320.0, ans=0.0 2023-11-26 03:44:52,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3204386.6666666665, ans=0.025 2023-11-26 03:45:15,960 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11750, loss[loss=0.06887, simple_loss=0.08499, pruned_loss=0.01146, audio_tagging_loss=0.01491, over 15635.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0899, pruned_loss=0.01258, audio_tagging_loss=0.008941, over 3040733.63 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:45:23,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-26 03:45:24,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3204520.0, ans=0.0 2023-11-26 03:45:34,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.75 vs. limit=10.0 2023-11-26 03:45:38,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-26 03:45:38,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3204653.3333333335, ans=0.09899494936611666 2023-11-26 03:46:00,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3204786.6666666665, ans=0.0 2023-11-26 03:46:11,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 8.820e+01 9.557e+01 1.032e+02 1.520e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 03:46:11,505 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11800, loss[loss=0.07241, simple_loss=0.1062, pruned_loss=0.01292, audio_tagging_loss=0.006369, over 15480.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.089, pruned_loss=0.01246, audio_tagging_loss=0.008992, over 3035353.86 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:46:19,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2023-11-26 03:46:28,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-11-26 03:46:34,385 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-26 03:46:38,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3204986.6666666665, ans=0.125 2023-11-26 03:46:41,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3204986.6666666665, ans=0.125 2023-11-26 03:46:44,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3205053.3333333335, ans=0.05 2023-11-26 03:47:07,425 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11850, loss[loss=0.07081, simple_loss=0.1029, pruned_loss=0.01311, audio_tagging_loss=0.00625, over 15637.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08975, pruned_loss=0.0126, audio_tagging_loss=0.008934, over 3038934.95 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:47:13,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3205186.6666666665, ans=0.1 2023-11-26 03:47:21,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-26 03:47:29,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-26 03:47:40,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3205386.6666666665, ans=0.0 2023-11-26 03:47:47,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3205386.6666666665, ans=0.2 2023-11-26 03:48:03,871 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11900, loss[loss=0.06212, simple_loss=0.09134, pruned_loss=0.007556, audio_tagging_loss=0.008893, over 15021.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09037, pruned_loss=0.01264, audio_tagging_loss=0.00896, over 3044522.35 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:48:04,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 8.863e+01 9.443e+01 1.007e+02 1.384e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 03:48:21,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3205586.6666666665, ans=0.09899494936611666 2023-11-26 03:48:25,795 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-26 03:48:49,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-26 03:48:53,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-26 03:48:59,010 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11950, loss[loss=0.06778, simple_loss=0.1018, pruned_loss=0.008517, audio_tagging_loss=0.008366, over 15013.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09015, pruned_loss=0.01255, audio_tagging_loss=0.00907, over 3047678.64 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:49:00,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3205853.3333333335, ans=0.0 2023-11-26 03:49:03,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-26 03:49:06,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3205853.3333333335, ans=0.0 2023-11-26 03:49:17,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3205920.0, ans=0.025 2023-11-26 03:49:19,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3205920.0, ans=0.0 2023-11-26 03:49:21,503 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-26 03:49:44,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-26 03:49:48,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3206120.0, ans=0.1 2023-11-26 03:49:51,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-11-26 03:49:53,273 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 12000, loss[loss=0.07375, simple_loss=0.09916, pruned_loss=0.01471, audio_tagging_loss=0.009459, over 15399.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09113, pruned_loss=0.01274, audio_tagging_loss=0.009082, over 3052529.40 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:49:53,273 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 03:50:12,373 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8555, 3.0342, 2.8074, 2.6766, 3.4272, 3.3989, 3.1786, 3.6533], device='cuda:1') 2023-11-26 03:50:25,664 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.0579, simple_loss=0.05064, pruned_loss=0.005235, audio_tagging_loss=0.02734, over 4681554.00 frames. 2023-11-26 03:50:25,665 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 03:50:26,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.771e+01 9.492e+01 1.018e+02 1.259e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 03:50:47,186 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-26 03:51:24,317 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 0, loss[loss=0.07665, simple_loss=0.08805, pruned_loss=0.009225, audio_tagging_loss=0.0234, over 14691.00 frames. ], tot_loss[loss=0.07665, simple_loss=0.08805, pruned_loss=0.009225, audio_tagging_loss=0.0234, over 14691.00 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:51:24,317 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 03:51:55,673 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05811, simple_loss=0.05068, pruned_loss=0.005302, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 03:51:55,674 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 03:51:55,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3206360.0, ans=0.0 2023-11-26 03:52:02,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=22.5 2023-11-26 03:52:10,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3206426.6666666665, ans=0.0 2023-11-26 03:52:13,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3206426.6666666665, ans=0.125 2023-11-26 03:52:44,392 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-26 03:52:51,492 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 50, loss[loss=0.05296, simple_loss=0.05912, pruned_loss=0.006874, audio_tagging_loss=0.01653, over 14751.00 frames. ], tot_loss[loss=0.07656, simple_loss=0.09371, pruned_loss=0.01335, audio_tagging_loss=0.01635, over 691396.91 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:52:54,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3206693.3333333335, ans=0.125 2023-11-26 03:53:06,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3206760.0, ans=0.125 2023-11-26 03:53:12,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3206760.0, ans=0.125 2023-11-26 03:53:19,597 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.787e+01 9.351e+01 1.009e+02 1.085e+02 1.541e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-26 03:53:19,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3206826.6666666665, ans=0.125 2023-11-26 03:53:41,037 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-26 03:53:45,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3206960.0, ans=15.0 2023-11-26 03:53:47,341 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 100, loss[loss=0.0751, simple_loss=0.09834, pruned_loss=0.0132, audio_tagging_loss=0.01273, over 16563.00 frames. ], tot_loss[loss=0.07496, simple_loss=0.09169, pruned_loss=0.013, audio_tagging_loss=0.01611, over 1214850.31 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:53:49,678 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:53:54,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2023-11-26 03:53:57,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207093.3333333335, ans=0.1 2023-11-26 03:54:01,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3207093.3333333335, ans=0.0 2023-11-26 03:54:03,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3207093.3333333335, ans=0.0 2023-11-26 03:54:08,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3207160.0, ans=0.0 2023-11-26 03:54:17,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-26 03:54:23,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2023-11-26 03:54:36,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-26 03:54:43,050 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 150, loss[loss=0.07902, simple_loss=0.1095, pruned_loss=0.01534, audio_tagging_loss=0.008951, over 15504.00 frames. ], tot_loss[loss=0.07224, simple_loss=0.09063, pruned_loss=0.01245, audio_tagging_loss=0.01447, over 1614931.21 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:54:51,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3207360.0, ans=0.0 2023-11-26 03:54:58,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3207426.6666666665, ans=0.0 2023-11-26 03:55:00,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3207426.6666666665, ans=0.0 2023-11-26 03:55:10,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.007e+01 9.477e+01 1.014e+02 1.465e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 03:55:23,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3207560.0, ans=0.05 2023-11-26 03:55:29,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-26 03:55:30,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2023-11-26 03:55:32,195 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-26 03:55:35,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-26 03:55:38,447 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 200, loss[loss=0.07466, simple_loss=0.1053, pruned_loss=0.01429, audio_tagging_loss=0.007737, over 14278.00 frames. ], tot_loss[loss=0.07042, simple_loss=0.09001, pruned_loss=0.01241, audio_tagging_loss=0.013, over 1919415.91 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:55:43,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3207693.3333333335, ans=0.0 2023-11-26 03:55:47,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3207693.3333333335, ans=0.05 2023-11-26 03:55:48,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3207693.3333333335, ans=15.0 2023-11-26 03:55:54,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3207760.0, ans=10.0 2023-11-26 03:56:04,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3207826.6666666665, ans=0.125 2023-11-26 03:56:11,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3207893.3333333335, ans=0.0 2023-11-26 03:56:28,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-26 03:56:28,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2023-11-26 03:56:35,403 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 250, loss[loss=0.07032, simple_loss=0.0841, pruned_loss=0.01619, audio_tagging_loss=0.01208, over 15143.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09099, pruned_loss=0.01271, audio_tagging_loss=0.01178, over 2179413.06 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:56:37,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208026.6666666665, ans=0.125 2023-11-26 03:57:04,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 8.798e+01 9.430e+01 1.056e+02 1.787e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 03:57:13,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3208226.6666666665, ans=0.0 2023-11-26 03:57:24,981 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-26 03:57:31,763 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 300, loss[loss=0.08244, simple_loss=0.1187, pruned_loss=0.01609, audio_tagging_loss=0.007002, over 14623.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.09108, pruned_loss=0.01258, audio_tagging_loss=0.01098, over 2374567.10 frames. ], batch size: 50, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:57:46,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3208426.6666666665, ans=0.125 2023-11-26 03:57:56,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208493.3333333335, ans=0.125 2023-11-26 03:58:05,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3208560.0, ans=0.125 2023-11-26 03:58:07,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3208560.0, ans=0.1 2023-11-26 03:58:20,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-26 03:58:26,964 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 350, loss[loss=0.05869, simple_loss=0.06934, pruned_loss=0.01102, audio_tagging_loss=0.013, over 15606.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09104, pruned_loss=0.01276, audio_tagging_loss=0.01031, over 2524915.50 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 8.0 2023-11-26 03:58:28,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3208693.3333333335, ans=0.0 2023-11-26 03:58:29,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3208693.3333333335, ans=0.2 2023-11-26 03:58:39,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3208760.0, ans=0.0 2023-11-26 03:58:40,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3208760.0, ans=0.0 2023-11-26 03:58:49,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3208826.6666666665, ans=0.1 2023-11-26 03:58:57,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-26 03:58:57,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.469e+01 9.311e+01 1.023e+02 1.499e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:58:58,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-26 03:58:58,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-26 03:59:01,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3208893.3333333335, ans=0.125 2023-11-26 03:59:16,448 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-26 03:59:19,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3208960.0, ans=0.125 2023-11-26 03:59:22,710 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 400, loss[loss=0.06451, simple_loss=0.09042, pruned_loss=0.01122, audio_tagging_loss=0.008074, over 15487.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09002, pruned_loss=0.01263, audio_tagging_loss=0.00996, over 2637891.56 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:59:23,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209026.6666666665, ans=0.1 2023-11-26 03:59:44,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3209160.0, ans=0.125 2023-11-26 04:00:11,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-26 04:00:13,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3209293.3333333335, ans=0.1 2023-11-26 04:00:19,430 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 450, loss[loss=0.04319, simple_loss=0.05062, pruned_loss=0.007798, audio_tagging_loss=0.01008, over 15375.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.08989, pruned_loss=0.01251, audio_tagging_loss=0.009719, over 2722277.23 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:00:20,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3209360.0, ans=0.0 2023-11-26 04:00:29,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2023-11-26 04:00:48,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.492e+01 9.023e+01 9.553e+01 1.244e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-26 04:00:51,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209560.0, ans=0.1 2023-11-26 04:00:55,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209560.0, ans=0.1 2023-11-26 04:01:04,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3209626.6666666665, ans=0.0 2023-11-26 04:01:08,254 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-26 04:01:14,673 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 500, loss[loss=0.06301, simple_loss=0.08665, pruned_loss=0.01171, audio_tagging_loss=0.007972, over 14329.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.0897, pruned_loss=0.01242, audio_tagging_loss=0.009563, over 2793968.67 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:01:19,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3209693.3333333335, ans=0.0 2023-11-26 04:01:24,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3209693.3333333335, ans=0.2 2023-11-26 04:01:29,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3209760.0, ans=0.0 2023-11-26 04:01:32,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3209760.0, ans=0.125 2023-11-26 04:01:46,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3209826.6666666665, ans=0.125 2023-11-26 04:02:04,060 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-26 04:02:10,846 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 550, loss[loss=0.06853, simple_loss=0.09809, pruned_loss=0.01141, audio_tagging_loss=0.00808, over 15551.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08966, pruned_loss=0.01238, audio_tagging_loss=0.009381, over 2853518.93 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:02:11,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3210026.6666666665, ans=0.125 2023-11-26 04:02:11,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-11-26 04:02:13,244 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:02:29,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-26 04:02:32,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3210160.0, ans=0.2 2023-11-26 04:02:41,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.521e+01 9.213e+01 9.979e+01 1.259e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 04:02:53,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2023-11-26 04:02:57,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3210293.3333333335, ans=0.125 2023-11-26 04:02:59,636 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-26 04:03:06,638 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 600, loss[loss=0.06837, simple_loss=0.09391, pruned_loss=0.01213, audio_tagging_loss=0.009279, over 15439.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08892, pruned_loss=0.01226, audio_tagging_loss=0.009441, over 2897341.33 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:03:17,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3210426.6666666665, ans=0.125 2023-11-26 04:03:47,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2023-11-26 04:03:55,065 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-26 04:04:01,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2023-11-26 04:04:01,758 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 650, loss[loss=0.08208, simple_loss=0.1051, pruned_loss=0.01789, audio_tagging_loss=0.01162, over 15875.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08871, pruned_loss=0.01214, audio_tagging_loss=0.009413, over 2939614.45 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:04:05,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3210693.3333333335, ans=0.125 2023-11-26 04:04:26,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=22.5 2023-11-26 04:04:30,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=22.5 2023-11-26 04:04:32,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.848e+01 9.324e+01 9.991e+01 1.249e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 04:04:49,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3210960.0, ans=0.2 2023-11-26 04:04:50,512 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-26 04:04:52,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3210960.0, ans=0.2 2023-11-26 04:04:57,409 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 700, loss[loss=0.04726, simple_loss=0.05623, pruned_loss=0.007428, audio_tagging_loss=0.01172, over 15280.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08887, pruned_loss=0.01211, audio_tagging_loss=0.009303, over 2959995.28 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:05:00,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3211026.6666666665, ans=0.1 2023-11-26 04:05:14,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3211093.3333333335, ans=0.2 2023-11-26 04:05:31,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-26 04:05:38,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3211226.6666666665, ans=0.0 2023-11-26 04:05:40,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-26 04:05:44,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-11-26 04:05:46,373 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-26 04:05:52,698 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 750, loss[loss=0.05922, simple_loss=0.07951, pruned_loss=0.008537, audio_tagging_loss=0.01092, over 15616.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09006, pruned_loss=0.0123, audio_tagging_loss=0.009157, over 2978417.25 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:06:03,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-11-26 04:06:10,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-11-26 04:06:15,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3211493.3333333335, ans=0.125 2023-11-26 04:06:23,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.760e+01 9.267e+01 1.006e+02 1.673e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 04:06:26,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3211560.0, ans=0.1 2023-11-26 04:06:32,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3211560.0, ans=0.04949747468305833 2023-11-26 04:06:32,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3211560.0, ans=0.125 2023-11-26 04:06:40,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-26 04:06:40,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3211626.6666666665, ans=0.125 2023-11-26 04:06:41,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-26 04:06:48,719 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 800, loss[loss=0.05631, simple_loss=0.07172, pruned_loss=0.01047, audio_tagging_loss=0.009982, over 15410.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09009, pruned_loss=0.01239, audio_tagging_loss=0.009199, over 2994800.62 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:06:52,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3211693.3333333335, ans=0.125 2023-11-26 04:07:36,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3211960.0, ans=0.2 2023-11-26 04:07:37,224 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-26 04:07:44,316 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 850, loss[loss=0.06669, simple_loss=0.08803, pruned_loss=0.01293, audio_tagging_loss=0.009741, over 15595.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08945, pruned_loss=0.01239, audio_tagging_loss=0.00929, over 3005309.74 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:07:47,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3212026.6666666665, ans=0.2 2023-11-26 04:07:49,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-26 04:07:55,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3212093.3333333335, ans=0.1 2023-11-26 04:08:14,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.719e+01 9.497e+01 1.051e+02 1.257e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 04:08:14,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3212160.0, ans=0.04949747468305833 2023-11-26 04:08:15,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3212160.0, ans=0.125 2023-11-26 04:08:20,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-26 04:08:25,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3212226.6666666665, ans=0.2 2023-11-26 04:08:28,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3212293.3333333335, ans=15.0 2023-11-26 04:08:32,638 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-26 04:08:33,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3212293.3333333335, ans=0.04949747468305833 2023-11-26 04:08:38,952 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 900, loss[loss=0.06579, simple_loss=0.08493, pruned_loss=0.01243, audio_tagging_loss=0.0109, over 15632.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08879, pruned_loss=0.01235, audio_tagging_loss=0.009408, over 3014694.19 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:08:52,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3212426.6666666665, ans=0.0 2023-11-26 04:09:05,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3212493.3333333335, ans=0.125 2023-11-26 04:09:17,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3212560.0, ans=0.2 2023-11-26 04:09:23,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3212626.6666666665, ans=0.0 2023-11-26 04:09:27,751 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-26 04:09:31,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3212626.6666666665, ans=0.125 2023-11-26 04:09:34,169 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 950, loss[loss=0.08245, simple_loss=0.1, pruned_loss=0.02408, audio_tagging_loss=0.008364, over 15553.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08924, pruned_loss=0.01252, audio_tagging_loss=0.009253, over 3021589.59 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:09:41,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3212693.3333333335, ans=0.125 2023-11-26 04:09:42,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3212693.3333333335, ans=0.0 2023-11-26 04:09:57,835 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:10:04,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.675e+01 9.421e+01 1.013e+02 1.384e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 04:10:23,843 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-26 04:10:30,176 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1000, loss[loss=0.06649, simple_loss=0.08096, pruned_loss=0.01556, audio_tagging_loss=0.01045, over 14958.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0892, pruned_loss=0.0125, audio_tagging_loss=0.009048, over 3020805.79 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:10:43,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3213093.3333333335, ans=0.125 2023-11-26 04:10:54,190 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:11:00,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3213160.0, ans=0.125 2023-11-26 04:11:17,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3213293.3333333335, ans=0.0 2023-11-26 04:11:18,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-26 04:11:19,749 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-26 04:11:20,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3213293.3333333335, ans=0.0 2023-11-26 04:11:26,291 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1050, loss[loss=0.05733, simple_loss=0.07757, pruned_loss=0.008247, audio_tagging_loss=0.0103, over 15562.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08806, pruned_loss=0.01239, audio_tagging_loss=0.008952, over 3025827.42 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:11:41,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3213426.6666666665, ans=0.1 2023-11-26 04:11:57,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.643e+01 9.285e+01 1.025e+02 1.343e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 04:12:13,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3213626.6666666665, ans=0.0 2023-11-26 04:12:16,620 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-26 04:12:22,929 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1100, loss[loss=0.08901, simple_loss=0.1213, pruned_loss=0.02154, audio_tagging_loss=0.006811, over 15390.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08838, pruned_loss=0.0126, audio_tagging_loss=0.008812, over 3027921.53 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:12:25,133 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:13:11,055 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-26 04:13:14,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=15.0 2023-11-26 04:13:17,890 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1150, loss[loss=0.06341, simple_loss=0.09001, pruned_loss=0.008687, audio_tagging_loss=0.009714, over 13914.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08885, pruned_loss=0.0127, audio_tagging_loss=0.008808, over 3025945.10 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:13:23,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3214026.6666666665, ans=0.125 2023-11-26 04:13:37,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3214093.3333333335, ans=0.0 2023-11-26 04:13:39,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3214160.0, ans=0.125 2023-11-26 04:13:40,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2023-11-26 04:13:48,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.737e+01 9.281e+01 9.829e+01 1.139e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:13:54,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3214226.6666666665, ans=0.125 2023-11-26 04:13:55,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-26 04:14:06,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-26 04:14:13,199 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1200, loss[loss=0.06287, simple_loss=0.07876, pruned_loss=0.01196, audio_tagging_loss=0.01153, over 14879.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08907, pruned_loss=0.01267, audio_tagging_loss=0.008785, over 3027275.13 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:14:34,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-26 04:14:35,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-26 04:14:43,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-26 04:15:00,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3214626.6666666665, ans=0.125 2023-11-26 04:15:00,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3214626.6666666665, ans=0.0 2023-11-26 04:15:02,011 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-26 04:15:06,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3214626.6666666665, ans=0.125 2023-11-26 04:15:09,168 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1250, loss[loss=0.08762, simple_loss=0.1267, pruned_loss=0.01739, audio_tagging_loss=0.006873, over 14721.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08926, pruned_loss=0.0127, audio_tagging_loss=0.008869, over 3032838.21 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:15:39,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.848e+01 9.499e+01 1.001e+02 1.397e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 04:15:57,548 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-26 04:16:03,852 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1300, loss[loss=0.05466, simple_loss=0.07883, pruned_loss=0.009313, audio_tagging_loss=0.005934, over 14783.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0888, pruned_loss=0.01242, audio_tagging_loss=0.008774, over 3032296.60 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:16:12,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3215026.6666666665, ans=0.125 2023-11-26 04:16:29,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3215160.0, ans=0.125 2023-11-26 04:16:34,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3215160.0, ans=0.95 2023-11-26 04:16:34,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215160.0, ans=0.1 2023-11-26 04:16:40,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3215226.6666666665, ans=0.0 2023-11-26 04:16:42,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3215226.6666666665, ans=0.125 2023-11-26 04:16:45,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=12.0 2023-11-26 04:16:53,415 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-26 04:16:53,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215293.3333333335, ans=0.1 2023-11-26 04:17:00,352 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1350, loss[loss=0.05652, simple_loss=0.07712, pruned_loss=0.008812, audio_tagging_loss=0.009144, over 14595.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08909, pruned_loss=0.01243, audio_tagging_loss=0.008768, over 3035326.73 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:17:12,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3215426.6666666665, ans=0.09899494936611666 2023-11-26 04:17:19,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3215426.6666666665, ans=0.125 2023-11-26 04:17:23,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.40 vs. limit=8.0 2023-11-26 04:17:24,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-26 04:17:31,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.487e+01 8.991e+01 9.732e+01 2.025e+02, threshold=1.798e+02, percent-clipped=1.0 2023-11-26 04:17:41,015 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:17:49,429 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-26 04:17:50,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215626.6666666665, ans=0.1 2023-11-26 04:17:56,839 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1400, loss[loss=0.07872, simple_loss=0.09777, pruned_loss=0.01626, audio_tagging_loss=0.01357, over 15216.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08898, pruned_loss=0.01246, audio_tagging_loss=0.008832, over 3044967.29 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:18:05,459 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:18:09,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3215760.0, ans=0.0 2023-11-26 04:18:14,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-26 04:18:15,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3215760.0, ans=0.0 2023-11-26 04:18:35,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3215893.3333333335, ans=0.125 2023-11-26 04:18:38,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3215893.3333333335, ans=0.0 2023-11-26 04:18:45,871 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-26 04:18:52,444 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1450, loss[loss=0.08872, simple_loss=0.125, pruned_loss=0.01585, audio_tagging_loss=0.01038, over 16123.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08931, pruned_loss=0.01251, audio_tagging_loss=0.008865, over 3051465.01 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:18:52,676 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:19:17,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3216160.0, ans=0.2 2023-11-26 04:19:20,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3216160.0, ans=0.09899494936611666 2023-11-26 04:19:20,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-26 04:19:24,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.656e+01 9.210e+01 9.975e+01 1.432e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:19:31,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3216226.6666666665, ans=0.125 2023-11-26 04:19:35,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3216226.6666666665, ans=0.2 2023-11-26 04:19:41,271 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-26 04:19:48,065 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1500, loss[loss=0.0688, simple_loss=0.09708, pruned_loss=0.009503, audio_tagging_loss=0.01076, over 15075.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08913, pruned_loss=0.01246, audio_tagging_loss=0.009022, over 3048115.13 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:19:48,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2023-11-26 04:20:16,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3216493.3333333335, ans=0.0 2023-11-26 04:20:16,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:17,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:20,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3216560.0, ans=0.2 2023-11-26 04:20:23,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3216560.0, ans=0.0 2023-11-26 04:20:37,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-26 04:20:44,831 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1550, loss[loss=0.07966, simple_loss=0.1102, pruned_loss=0.01721, audio_tagging_loss=0.007371, over 15369.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08946, pruned_loss=0.01258, audio_tagging_loss=0.00914, over 3053302.67 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:20:57,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3216760.0, ans=0.125 2023-11-26 04:21:03,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3216760.0, ans=0.2 2023-11-26 04:21:11,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-11-26 04:21:12,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3216826.6666666665, ans=0.0 2023-11-26 04:21:15,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.764e+01 9.258e+01 1.010e+02 1.215e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 04:21:22,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3216893.3333333335, ans=0.125 2023-11-26 04:21:33,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-26 04:21:40,016 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1600, loss[loss=0.06638, simple_loss=0.08867, pruned_loss=0.01155, audio_tagging_loss=0.0105, over 14089.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0894, pruned_loss=0.01253, audio_tagging_loss=0.009243, over 3053094.48 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:21:44,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3217026.6666666665, ans=0.04949747468305833 2023-11-26 04:21:54,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3217093.3333333335, ans=0.0 2023-11-26 04:22:02,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-26 04:22:12,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3217160.0, ans=0.0 2023-11-26 04:22:13,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-26 04:22:28,858 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-26 04:22:36,040 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1650, loss[loss=0.04584, simple_loss=0.05297, pruned_loss=0.006515, audio_tagging_loss=0.01284, over 15768.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08902, pruned_loss=0.01247, audio_tagging_loss=0.009283, over 3056781.68 frames. ], batch size: 64, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:22:43,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3217360.0, ans=0.0 2023-11-26 04:22:53,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3217426.6666666665, ans=0.0 2023-11-26 04:22:54,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217426.6666666665, ans=0.1 2023-11-26 04:23:04,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2023-11-26 04:23:06,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-26 04:23:07,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.467e+01 9.120e+01 9.826e+01 1.173e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 04:23:21,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3217626.6666666665, ans=0.1 2023-11-26 04:23:21,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.61 vs. limit=15.0 2023-11-26 04:23:22,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-26 04:23:23,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=12.0 2023-11-26 04:23:24,375 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-26 04:23:26,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3217626.6666666665, ans=0.125 2023-11-26 04:23:31,212 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1700, loss[loss=0.08671, simple_loss=0.1221, pruned_loss=0.01724, audio_tagging_loss=0.008411, over 15288.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08962, pruned_loss=0.01238, audio_tagging_loss=0.009202, over 3054771.61 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:23:35,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3217693.3333333335, ans=0.0 2023-11-26 04:23:43,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3217760.0, ans=0.0 2023-11-26 04:23:59,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-26 04:24:03,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3217893.3333333335, ans=0.0 2023-11-26 04:24:11,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217893.3333333335, ans=0.1 2023-11-26 04:24:17,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3217960.0, ans=0.1 2023-11-26 04:24:19,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3217960.0, ans=0.0 2023-11-26 04:24:20,451 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-26 04:24:24,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3217960.0, ans=0.035 2023-11-26 04:24:25,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2023-11-26 04:24:26,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-26 04:24:26,775 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1750, loss[loss=0.06152, simple_loss=0.08025, pruned_loss=0.01365, audio_tagging_loss=0.007748, over 14974.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08919, pruned_loss=0.0123, audio_tagging_loss=0.009102, over 3048868.84 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:24:31,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-26 04:24:59,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.710e+01 9.428e+01 1.004e+02 1.247e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 04:25:11,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3218293.3333333335, ans=0.2 2023-11-26 04:25:13,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-26 04:25:15,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-26 04:25:22,300 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1800, loss[loss=0.05318, simple_loss=0.0753, pruned_loss=0.007993, audio_tagging_loss=0.007535, over 14443.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08973, pruned_loss=0.01252, audio_tagging_loss=0.008929, over 3047139.71 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:25:25,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3218360.0, ans=0.125 2023-11-26 04:25:35,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3218426.6666666665, ans=0.0 2023-11-26 04:25:52,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3218493.3333333335, ans=0.125 2023-11-26 04:26:07,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3218626.6666666665, ans=0.1 2023-11-26 04:26:11,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-26 04:26:16,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-26 04:26:18,151 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1850, loss[loss=0.07747, simple_loss=0.1078, pruned_loss=0.01585, audio_tagging_loss=0.007692, over 14921.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09048, pruned_loss=0.01254, audio_tagging_loss=0.008782, over 3043758.99 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:26:28,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3218693.3333333335, ans=0.125 2023-11-26 04:26:39,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3218760.0, ans=0.1 2023-11-26 04:26:39,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3218760.0, ans=0.125 2023-11-26 04:26:51,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.663e+01 9.346e+01 1.025e+02 1.313e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:26:53,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3218893.3333333335, ans=0.1 2023-11-26 04:26:55,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-26 04:26:59,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=12.0 2023-11-26 04:27:03,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3218960.0, ans=0.125 2023-11-26 04:27:05,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3218960.0, ans=0.125 2023-11-26 04:27:08,515 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-26 04:27:15,284 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1900, loss[loss=0.05793, simple_loss=0.07781, pruned_loss=0.009912, audio_tagging_loss=0.009112, over 16325.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09036, pruned_loss=0.01261, audio_tagging_loss=0.008709, over 3056989.53 frames. ], batch size: 63, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:27:23,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-26 04:27:35,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3219093.3333333335, ans=0.0 2023-11-26 04:27:47,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:47,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:52,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:55,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:28:04,386 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-26 04:28:11,352 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1950, loss[loss=0.05819, simple_loss=0.07697, pruned_loss=0.008217, audio_tagging_loss=0.01149, over 14895.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09042, pruned_loss=0.0126, audio_tagging_loss=0.008701, over 3051966.53 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:28:15,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3219360.0, ans=0.1 2023-11-26 04:28:43,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.481e+01 9.198e+01 9.869e+01 1.193e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 04:29:00,261 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-26 04:29:04,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3219626.6666666665, ans=0.125 2023-11-26 04:29:05,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-26 04:29:06,473 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2000, loss[loss=0.0818, simple_loss=0.1149, pruned_loss=0.01818, audio_tagging_loss=0.006183, over 14533.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08946, pruned_loss=0.01238, audio_tagging_loss=0.008755, over 3038301.20 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:29:06,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-26 04:29:12,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3219693.3333333335, ans=0.1 2023-11-26 04:29:56,801 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-26 04:30:03,483 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2050, loss[loss=0.08042, simple_loss=0.1186, pruned_loss=0.0135, audio_tagging_loss=0.007603, over 15321.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08987, pruned_loss=0.01237, audio_tagging_loss=0.008704, over 3042867.93 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:30:21,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2023-11-26 04:30:29,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3220160.0, ans=0.125 2023-11-26 04:30:29,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=15.0 2023-11-26 04:30:31,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-11-26 04:30:36,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.605e+01 9.268e+01 1.003e+02 1.182e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 04:30:42,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-26 04:30:53,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-26 04:30:57,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-26 04:30:59,850 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2100, loss[loss=0.04689, simple_loss=0.06212, pruned_loss=0.006395, audio_tagging_loss=0.009438, over 13555.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08928, pruned_loss=0.01236, audio_tagging_loss=0.008719, over 3043387.72 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:31:17,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3220426.6666666665, ans=0.2 2023-11-26 04:31:35,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-26 04:31:41,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3220560.0, ans=0.125 2023-11-26 04:31:45,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3220626.6666666665, ans=0.0 2023-11-26 04:31:46,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3220626.6666666665, ans=0.0 2023-11-26 04:31:49,132 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-26 04:31:55,336 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2150, loss[loss=0.05085, simple_loss=0.06635, pruned_loss=0.007913, audio_tagging_loss=0.009762, over 16499.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08903, pruned_loss=0.01232, audio_tagging_loss=0.008726, over 3042852.20 frames. ], batch size: 65, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:31:56,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3220693.3333333335, ans=0.0 2023-11-26 04:32:28,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.606e+01 9.465e+01 1.020e+02 1.219e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 04:32:29,694 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:32:41,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-26 04:32:45,743 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-26 04:32:52,090 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2200, loss[loss=0.04529, simple_loss=0.06628, pruned_loss=0.004425, audio_tagging_loss=0.007722, over 15141.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.0895, pruned_loss=0.01247, audio_tagging_loss=0.008805, over 3043974.93 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:32:59,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3221026.6666666665, ans=0.125 2023-11-26 04:33:04,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3221093.3333333335, ans=0.025 2023-11-26 04:33:07,889 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:33:08,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3221093.3333333335, ans=0.0 2023-11-26 04:33:18,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3221160.0, ans=0.02 2023-11-26 04:33:27,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2023-11-26 04:33:30,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3221226.6666666665, ans=0.125 2023-11-26 04:33:41,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-26 04:33:44,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3221293.3333333335, ans=0.0 2023-11-26 04:33:47,554 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2250, loss[loss=0.06657, simple_loss=0.09771, pruned_loss=0.009802, audio_tagging_loss=0.007908, over 15137.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09042, pruned_loss=0.01267, audio_tagging_loss=0.008796, over 3042988.50 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:34:13,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3221493.3333333335, ans=0.125 2023-11-26 04:34:21,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.817e+01 9.211e+01 9.808e+01 1.275e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:34:22,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3221560.0, ans=0.125 2023-11-26 04:34:34,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3221626.6666666665, ans=0.0 2023-11-26 04:34:35,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3221626.6666666665, ans=0.0 2023-11-26 04:34:37,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-26 04:34:44,125 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2300, loss[loss=0.07261, simple_loss=0.09625, pruned_loss=0.01392, audio_tagging_loss=0.01057, over 14781.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09006, pruned_loss=0.01266, audio_tagging_loss=0.008823, over 3045689.14 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:34:49,699 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:34:55,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3221760.0, ans=0.125 2023-11-26 04:35:32,592 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:35:32,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-26 04:35:37,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3221960.0, ans=0.125 2023-11-26 04:35:40,085 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2350, loss[loss=0.06875, simple_loss=0.09349, pruned_loss=0.01385, audio_tagging_loss=0.008159, over 15161.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08971, pruned_loss=0.01271, audio_tagging_loss=0.008948, over 3042992.11 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:35:52,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3222093.3333333335, ans=0.04949747468305833 2023-11-26 04:36:13,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.775e+01 9.413e+01 9.957e+01 1.252e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 04:36:14,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3222226.6666666665, ans=0.09899494936611666 2023-11-26 04:36:15,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3222226.6666666665, ans=0.2 2023-11-26 04:36:29,216 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-26 04:36:32,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3222293.3333333335, ans=0.125 2023-11-26 04:36:35,652 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2400, loss[loss=0.072, simple_loss=0.1064, pruned_loss=0.01144, audio_tagging_loss=0.007355, over 16049.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.0905, pruned_loss=0.0128, audio_tagging_loss=0.009025, over 3046669.63 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:36:52,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3222426.6666666665, ans=0.125 2023-11-26 04:36:59,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3222493.3333333335, ans=0.1 2023-11-26 04:37:02,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3222493.3333333335, ans=0.125 2023-11-26 04:37:07,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3222493.3333333335, ans=0.125 2023-11-26 04:37:07,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3222493.3333333335, ans=0.0 2023-11-26 04:37:10,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2023-11-26 04:37:18,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-11-26 04:37:24,487 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-26 04:37:32,127 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2450, loss[loss=0.04864, simple_loss=0.06121, pruned_loss=0.006788, audio_tagging_loss=0.01124, over 15639.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08985, pruned_loss=0.01261, audio_tagging_loss=0.00912, over 3049804.12 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:37:34,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3222693.3333333335, ans=0.0 2023-11-26 04:37:58,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3222826.6666666665, ans=0.125 2023-11-26 04:38:05,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.820e+01 9.460e+01 9.914e+01 1.229e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 04:38:12,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3222893.3333333335, ans=0.125 2023-11-26 04:38:15,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3222960.0, ans=0.125 2023-11-26 04:38:20,996 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-26 04:38:26,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2023-11-26 04:38:28,473 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2500, loss[loss=0.05583, simple_loss=0.07533, pruned_loss=0.008631, audio_tagging_loss=0.009534, over 14335.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08943, pruned_loss=0.01257, audio_tagging_loss=0.009108, over 3046677.96 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:38:48,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-26 04:39:07,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3223226.6666666665, ans=0.2 2023-11-26 04:39:17,452 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-26 04:39:23,661 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2550, loss[loss=0.07846, simple_loss=0.1158, pruned_loss=0.01558, audio_tagging_loss=0.004955, over 15913.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08915, pruned_loss=0.01247, audio_tagging_loss=0.009055, over 3045562.91 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:39:48,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3223493.3333333335, ans=0.125 2023-11-26 04:39:58,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.655e+01 9.369e+01 9.898e+01 1.233e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:39:58,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3223560.0, ans=0.125 2023-11-26 04:40:04,740 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:40:13,210 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-26 04:40:20,122 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2600, loss[loss=0.05639, simple_loss=0.06662, pruned_loss=0.01112, audio_tagging_loss=0.01195, over 14233.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08835, pruned_loss=0.01215, audio_tagging_loss=0.009006, over 3044637.67 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:40:25,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3223693.3333333335, ans=0.1 2023-11-26 04:40:29,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3223693.3333333335, ans=0.125 2023-11-26 04:40:29,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3223693.3333333335, ans=0.0 2023-11-26 04:40:43,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-26 04:41:06,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3223960.0, ans=0.0 2023-11-26 04:41:09,395 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-26 04:41:17,197 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2650, loss[loss=0.04925, simple_loss=0.05524, pruned_loss=0.01001, audio_tagging_loss=0.01163, over 13724.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08826, pruned_loss=0.0122, audio_tagging_loss=0.008894, over 3043519.48 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:41:39,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3224160.0, ans=15.0 2023-11-26 04:41:50,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 8.492e+01 9.203e+01 1.002e+02 1.237e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 04:41:53,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3224226.6666666665, ans=0.125 2023-11-26 04:42:03,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3224293.3333333335, ans=0.125 2023-11-26 04:42:06,585 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-26 04:42:07,725 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:42:11,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-26 04:42:12,941 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2700, loss[loss=0.07203, simple_loss=0.101, pruned_loss=0.01308, audio_tagging_loss=0.008469, over 15657.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08919, pruned_loss=0.01235, audio_tagging_loss=0.008881, over 3041252.68 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:42:13,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3224360.0, ans=0.2 2023-11-26 04:42:13,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3224360.0, ans=0.0 2023-11-26 04:42:15,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3224360.0, ans=0.125 2023-11-26 04:42:25,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3224426.6666666665, ans=0.125 2023-11-26 04:42:31,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3224426.6666666665, ans=15.0 2023-11-26 04:42:45,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3224493.3333333335, ans=0.0 2023-11-26 04:42:49,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2023-11-26 04:42:49,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3224560.0, ans=0.125 2023-11-26 04:42:52,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2023-11-26 04:42:53,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2023-11-26 04:42:55,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3224560.0, ans=0.125 2023-11-26 04:43:02,261 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-26 04:43:08,495 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2750, loss[loss=0.05868, simple_loss=0.07963, pruned_loss=0.00995, audio_tagging_loss=0.008915, over 15292.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08899, pruned_loss=0.01241, audio_tagging_loss=0.008922, over 3039078.38 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:43:29,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-26 04:43:30,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3224760.0, ans=0.125 2023-11-26 04:43:35,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3224826.6666666665, ans=0.07 2023-11-26 04:43:43,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.913e+01 9.370e+01 9.874e+01 1.312e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:43:55,852 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:43:57,983 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-26 04:44:04,801 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2800, loss[loss=0.07548, simple_loss=0.1084, pruned_loss=0.014, audio_tagging_loss=0.007281, over 15885.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.0884, pruned_loss=0.01236, audio_tagging_loss=0.008948, over 3035539.63 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:44:12,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3225026.6666666665, ans=0.125 2023-11-26 04:44:17,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3225093.3333333335, ans=0.05 2023-11-26 04:44:29,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3225160.0, ans=0.125 2023-11-26 04:44:32,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3225160.0, ans=10.0 2023-11-26 04:44:37,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3225226.6666666665, ans=0.0 2023-11-26 04:44:43,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-26 04:44:55,117 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-26 04:45:01,807 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2850, loss[loss=0.03789, simple_loss=0.04087, pruned_loss=0.00468, audio_tagging_loss=0.01277, over 16703.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08818, pruned_loss=0.01229, audio_tagging_loss=0.008908, over 3040871.56 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:45:05,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3225360.0, ans=0.125 2023-11-26 04:45:10,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3225360.0, ans=0.125 2023-11-26 04:45:29,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3225493.3333333335, ans=0.2 2023-11-26 04:45:36,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.846e+01 9.347e+01 1.008e+02 1.244e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:45:50,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-26 04:45:57,146 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2900, loss[loss=0.05617, simple_loss=0.0795, pruned_loss=0.00813, audio_tagging_loss=0.008293, over 16133.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08846, pruned_loss=0.0124, audio_tagging_loss=0.008787, over 3040595.96 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:46:06,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2023-11-26 04:46:08,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3225760.0, ans=22.5 2023-11-26 04:46:25,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3225826.6666666665, ans=0.125 2023-11-26 04:46:46,749 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-26 04:46:46,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3225960.0, ans=0.1 2023-11-26 04:46:52,993 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2950, loss[loss=0.04698, simple_loss=0.04608, pruned_loss=0.009452, audio_tagging_loss=0.01449, over 14874.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08903, pruned_loss=0.01237, audio_tagging_loss=0.008879, over 3050316.91 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:46:53,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-26 04:46:56,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3226026.6666666665, ans=0.125 2023-11-26 04:47:01,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226026.6666666665, ans=0.1 2023-11-26 04:47:09,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3226093.3333333335, ans=0.0 2023-11-26 04:47:10,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226093.3333333335, ans=0.1 2023-11-26 04:47:21,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3226160.0, ans=0.0 2023-11-26 04:47:27,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-26 04:47:27,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.828e+01 9.406e+01 1.023e+02 1.338e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 04:47:42,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-26 04:47:47,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3226293.3333333335, ans=0.0 2023-11-26 04:47:49,821 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3000, loss[loss=0.06398, simple_loss=0.08976, pruned_loss=0.01199, audio_tagging_loss=0.007115, over 14601.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08983, pruned_loss=0.01253, audio_tagging_loss=0.008817, over 3056430.88 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:47:49,822 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 04:48:20,319 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5109, 3.2902, 3.8524, 3.6699], device='cuda:1') 2023-11-26 04:48:22,226 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05755, simple_loss=0.05064, pruned_loss=0.005227, audio_tagging_loss=0.02701, over 4681554.00 frames. 2023-11-26 04:48:22,227 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 04:48:24,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3226360.0, ans=0.125 2023-11-26 04:48:49,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-26 04:49:11,274 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-26 04:49:20,476 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3050, loss[loss=0.06506, simple_loss=0.08364, pruned_loss=0.01354, audio_tagging_loss=0.0097, over 16104.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09032, pruned_loss=0.01259, audio_tagging_loss=0.008908, over 3054120.33 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:49:20,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3226693.3333333335, ans=0.125 2023-11-26 04:49:24,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-26 04:49:51,996 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:49:54,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3226893.3333333335, ans=0.0 2023-11-26 04:49:55,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.733e+01 9.255e+01 1.004e+02 1.259e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:49:59,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3226893.3333333335, ans=0.0 2023-11-26 04:50:03,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3226893.3333333335, ans=0.0 2023-11-26 04:50:07,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3226960.0, ans=0.0 2023-11-26 04:50:10,298 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-26 04:50:15,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.78 vs. limit=10.0 2023-11-26 04:50:17,069 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3100, loss[loss=0.0763, simple_loss=0.1106, pruned_loss=0.01135, audio_tagging_loss=0.009637, over 16103.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09071, pruned_loss=0.01266, audio_tagging_loss=0.008964, over 3050254.01 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:50:17,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3227026.6666666665, ans=0.5 2023-11-26 04:50:18,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3227026.6666666665, ans=0.1 2023-11-26 04:50:42,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3227160.0, ans=0.125 2023-11-26 04:50:44,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-11-26 04:51:06,107 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-26 04:51:09,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3227293.3333333335, ans=0.125 2023-11-26 04:51:12,461 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3150, loss[loss=0.05406, simple_loss=0.07286, pruned_loss=0.008553, audio_tagging_loss=0.009078, over 14964.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09016, pruned_loss=0.01246, audio_tagging_loss=0.009083, over 3039583.08 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:51:18,444 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:51:25,149 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:51:32,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3227426.6666666665, ans=0.0 2023-11-26 04:51:41,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227493.3333333335, ans=0.1 2023-11-26 04:51:48,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.901e+01 8.686e+01 9.278e+01 1.012e+02 1.304e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:51:49,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3227560.0, ans=0.125 2023-11-26 04:51:55,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3227560.0, ans=0.0 2023-11-26 04:52:01,922 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-26 04:52:07,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3227693.3333333335, ans=0.5 2023-11-26 04:52:08,300 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3200, loss[loss=0.08482, simple_loss=0.1219, pruned_loss=0.01839, audio_tagging_loss=0.005493, over 14752.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09026, pruned_loss=0.01243, audio_tagging_loss=0.009071, over 3037139.56 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:52:19,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3227760.0, ans=0.0 2023-11-26 04:52:21,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-26 04:52:34,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3227826.6666666665, ans=0.1 2023-11-26 04:52:48,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3227893.3333333335, ans=0.125 2023-11-26 04:52:56,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2023-11-26 04:52:57,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-26 04:53:04,803 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3250, loss[loss=0.07227, simple_loss=0.1032, pruned_loss=0.01386, audio_tagging_loss=0.006826, over 15247.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.0898, pruned_loss=0.01246, audio_tagging_loss=0.009139, over 3035450.35 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:53:13,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3228026.6666666665, ans=0.2 2023-11-26 04:53:28,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3228160.0, ans=0.1 2023-11-26 04:53:29,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3228160.0, ans=0.125 2023-11-26 04:53:32,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3228160.0, ans=0.125 2023-11-26 04:53:34,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3228160.0, ans=0.125 2023-11-26 04:53:38,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3228226.6666666665, ans=0.2 2023-11-26 04:53:40,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.676e+01 9.295e+01 9.800e+01 1.223e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 04:53:44,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3228226.6666666665, ans=0.0 2023-11-26 04:53:51,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3228293.3333333335, ans=0.07 2023-11-26 04:53:54,343 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-26 04:54:00,672 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3300, loss[loss=0.06653, simple_loss=0.0884, pruned_loss=0.01205, audio_tagging_loss=0.01028, over 16053.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08974, pruned_loss=0.01261, audio_tagging_loss=0.009238, over 3041726.60 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:54:08,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228360.0, ans=0.1 2023-11-26 04:54:12,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-26 04:54:14,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3228426.6666666665, ans=0.1 2023-11-26 04:54:21,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2023-11-26 04:54:31,230 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:54:34,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3228560.0, ans=0.1 2023-11-26 04:54:34,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2023-11-26 04:54:50,298 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-26 04:54:56,667 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3350, loss[loss=0.06829, simple_loss=0.09415, pruned_loss=0.0139, audio_tagging_loss=0.00732, over 15495.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09064, pruned_loss=0.01288, audio_tagging_loss=0.009066, over 3048771.81 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:55:20,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-26 04:55:32,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.894e+01 9.635e+01 1.028e+02 1.225e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 04:55:44,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-26 04:55:46,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-26 04:55:50,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3228960.0, ans=0.125 2023-11-26 04:55:52,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3229026.6666666665, ans=0.2 2023-11-26 04:55:52,828 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3400, loss[loss=0.0745, simple_loss=0.1092, pruned_loss=0.01344, audio_tagging_loss=0.006453, over 15338.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09086, pruned_loss=0.01289, audio_tagging_loss=0.008914, over 3050383.70 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:56:41,954 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-26 04:56:44,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3229293.3333333335, ans=0.05 2023-11-26 04:56:49,097 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3450, loss[loss=0.04932, simple_loss=0.06205, pruned_loss=0.006714, audio_tagging_loss=0.01158, over 16741.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09055, pruned_loss=0.01285, audio_tagging_loss=0.008906, over 3052035.56 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:56:49,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=12.0 2023-11-26 04:56:52,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3229360.0, ans=0.0 2023-11-26 04:57:02,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3229426.6666666665, ans=0.5 2023-11-26 04:57:18,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3229493.3333333335, ans=0.0 2023-11-26 04:57:24,870 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.632e+01 9.209e+01 1.007e+02 1.265e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:57:38,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-26 04:57:45,205 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3500, loss[loss=0.05475, simple_loss=0.07233, pruned_loss=0.007704, audio_tagging_loss=0.01089, over 13108.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08947, pruned_loss=0.01266, audio_tagging_loss=0.008954, over 3046577.09 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:58:00,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3229760.0, ans=0.0 2023-11-26 04:58:04,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3229760.0, ans=0.0 2023-11-26 04:58:09,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2023-11-26 04:58:09,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3229826.6666666665, ans=0.125 2023-11-26 04:58:12,795 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:58:16,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3229826.6666666665, ans=0.0 2023-11-26 04:58:17,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3229893.3333333335, ans=0.05 2023-11-26 04:58:27,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3229893.3333333335, ans=0.0 2023-11-26 04:58:32,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3229960.0, ans=0.125 2023-11-26 04:58:34,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-26 04:58:41,685 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3550, loss[loss=0.06114, simple_loss=0.08414, pruned_loss=0.01137, audio_tagging_loss=0.007697, over 14303.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08997, pruned_loss=0.01258, audio_tagging_loss=0.008964, over 3048012.44 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:58:42,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3230026.6666666665, ans=0.1 2023-11-26 04:58:54,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-26 04:59:04,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3230160.0, ans=0.0 2023-11-26 04:59:13,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3230226.6666666665, ans=0.125 2023-11-26 04:59:18,487 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.467e+01 9.253e+01 9.852e+01 1.320e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:59:18,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3230226.6666666665, ans=0.1 2023-11-26 04:59:30,885 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-26 04:59:37,117 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3600, loss[loss=0.09445, simple_loss=0.1283, pruned_loss=0.02419, audio_tagging_loss=0.006122, over 15855.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09059, pruned_loss=0.01277, audio_tagging_loss=0.008851, over 3047187.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:59:45,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230360.0, ans=0.1 2023-11-26 04:59:48,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3230426.6666666665, ans=0.0 2023-11-26 05:00:00,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3230493.3333333335, ans=0.125 2023-11-26 05:00:15,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3230560.0, ans=0.2 2023-11-26 05:00:25,916 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-26 05:00:27,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230626.6666666665, ans=0.1 2023-11-26 05:00:29,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230626.6666666665, ans=0.1 2023-11-26 05:00:32,960 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3650, loss[loss=0.06514, simple_loss=0.08035, pruned_loss=0.01404, audio_tagging_loss=0.01092, over 14880.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08989, pruned_loss=0.01274, audio_tagging_loss=0.008856, over 3052246.22 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:00:48,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3230760.0, ans=0.0 2023-11-26 05:01:08,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.921e+01 9.497e+01 1.030e+02 1.167e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 05:01:14,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3230893.3333333335, ans=0.125 2023-11-26 05:01:17,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.75 vs. limit=22.5 2023-11-26 05:01:20,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3230960.0, ans=0.2 2023-11-26 05:01:21,820 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-26 05:01:28,666 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3700, loss[loss=0.07678, simple_loss=0.1119, pruned_loss=0.01414, audio_tagging_loss=0.006721, over 14576.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08956, pruned_loss=0.01263, audio_tagging_loss=0.008853, over 3051922.20 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:01:32,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3231026.6666666665, ans=0.0 2023-11-26 05:01:33,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3231026.6666666665, ans=0.0 2023-11-26 05:01:38,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3231026.6666666665, ans=0.1 2023-11-26 05:01:56,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3231160.0, ans=0.95 2023-11-26 05:02:18,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-26 05:02:24,657 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3750, loss[loss=0.07116, simple_loss=0.08479, pruned_loss=0.01644, audio_tagging_loss=0.01233, over 14291.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.0895, pruned_loss=0.01259, audio_tagging_loss=0.008902, over 3050607.51 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:02:39,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3231426.6666666665, ans=0.5 2023-11-26 05:02:46,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3231493.3333333335, ans=0.125 2023-11-26 05:02:49,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3231493.3333333335, ans=0.125 2023-11-26 05:02:50,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-11-26 05:02:52,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3231493.3333333335, ans=0.09899494936611666 2023-11-26 05:03:02,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.844e+01 9.429e+01 1.038e+02 1.452e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 05:03:02,861 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:03:11,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3231626.6666666665, ans=0.125 2023-11-26 05:03:13,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-26 05:03:19,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3231693.3333333335, ans=0.0 2023-11-26 05:03:20,222 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3800, loss[loss=0.07589, simple_loss=0.1024, pruned_loss=0.01355, audio_tagging_loss=0.01113, over 15344.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08963, pruned_loss=0.01257, audio_tagging_loss=0.009014, over 3055629.88 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:03:21,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3231693.3333333335, ans=0.125 2023-11-26 05:03:43,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2023-11-26 05:03:47,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3231826.6666666665, ans=0.1 2023-11-26 05:03:58,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3231893.3333333335, ans=0.125 2023-11-26 05:03:59,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3231893.3333333335, ans=0.09899494936611666 2023-11-26 05:04:07,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3231960.0, ans=0.125 2023-11-26 05:04:09,527 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-26 05:04:16,587 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3850, loss[loss=0.04994, simple_loss=0.06137, pruned_loss=0.01065, audio_tagging_loss=0.008605, over 14728.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08899, pruned_loss=0.01249, audio_tagging_loss=0.009087, over 3052174.05 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:04:24,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3232026.6666666665, ans=0.125 2023-11-26 05:04:26,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3232093.3333333335, ans=0.125 2023-11-26 05:04:33,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3232093.3333333335, ans=0.2 2023-11-26 05:04:51,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-26 05:04:54,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.842e+01 9.367e+01 1.019e+02 1.484e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 05:04:59,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3232226.6666666665, ans=0.1 2023-11-26 05:05:01,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2023-11-26 05:05:05,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-26 05:05:12,184 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3900, loss[loss=0.07531, simple_loss=0.101, pruned_loss=0.01458, audio_tagging_loss=0.01025, over 14492.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09099, pruned_loss=0.01297, audio_tagging_loss=0.009044, over 3047067.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:05:12,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3232360.0, ans=0.04949747468305833 2023-11-26 05:05:13,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3232360.0, ans=0.125 2023-11-26 05:05:33,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=22.5 2023-11-26 05:05:40,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3232493.3333333335, ans=0.125 2023-11-26 05:05:50,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3232560.0, ans=0.07 2023-11-26 05:05:57,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3232626.6666666665, ans=0.2 2023-11-26 05:06:01,422 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-26 05:06:07,640 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3950, loss[loss=0.08256, simple_loss=0.1111, pruned_loss=0.01986, audio_tagging_loss=0.007131, over 14795.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09044, pruned_loss=0.01284, audio_tagging_loss=0.009098, over 3050740.19 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:06:10,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3232693.3333333335, ans=0.125 2023-11-26 05:06:15,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3232693.3333333335, ans=0.0 2023-11-26 05:06:21,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3232760.0, ans=0.125 2023-11-26 05:06:23,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3232760.0, ans=0.125 2023-11-26 05:06:27,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-26 05:06:29,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3232760.0, ans=0.125 2023-11-26 05:06:33,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=12.0 2023-11-26 05:06:38,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232826.6666666665, ans=0.1 2023-11-26 05:06:45,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.918e+01 9.453e+01 1.012e+02 1.260e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 05:06:57,174 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-26 05:07:04,062 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4000, loss[loss=0.07489, simple_loss=0.1082, pruned_loss=0.01529, audio_tagging_loss=0.005479, over 14637.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09094, pruned_loss=0.01294, audio_tagging_loss=0.009106, over 3051764.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:07:22,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3233093.3333333335, ans=0.125 2023-11-26 05:07:25,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3233160.0, ans=0.125 2023-11-26 05:07:41,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3233226.6666666665, ans=0.0 2023-11-26 05:07:45,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3233226.6666666665, ans=0.125 2023-11-26 05:07:52,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2023-11-26 05:07:54,539 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-26 05:08:00,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3233360.0, ans=0.125 2023-11-26 05:08:01,193 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4050, loss[loss=0.06586, simple_loss=0.09239, pruned_loss=0.01081, audio_tagging_loss=0.008854, over 14124.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09121, pruned_loss=0.01295, audio_tagging_loss=0.009116, over 3046734.58 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:08:03,386 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:08:16,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3233426.6666666665, ans=0.125 2023-11-26 05:08:25,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3233493.3333333335, ans=0.125 2023-11-26 05:08:34,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3233560.0, ans=0.1 2023-11-26 05:08:38,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.932e+01 9.380e+01 1.022e+02 1.358e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 05:08:45,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3233626.6666666665, ans=0.1 2023-11-26 05:08:49,765 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-26 05:08:50,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3233626.6666666665, ans=0.1 2023-11-26 05:08:56,102 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4100, loss[loss=0.05213, simple_loss=0.0698, pruned_loss=0.01053, audio_tagging_loss=0.0067, over 15191.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09126, pruned_loss=0.01291, audio_tagging_loss=0.009034, over 3047009.09 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:09:10,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3233760.0, ans=0.125 2023-11-26 05:09:11,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3233760.0, ans=0.0 2023-11-26 05:09:14,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3233760.0, ans=0.0 2023-11-26 05:09:33,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3233893.3333333335, ans=0.125 2023-11-26 05:09:45,638 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-26 05:09:51,896 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4150, loss[loss=0.07962, simple_loss=0.1123, pruned_loss=0.01739, audio_tagging_loss=0.006075, over 15286.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09135, pruned_loss=0.01286, audio_tagging_loss=0.008914, over 3050420.89 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:09:54,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234026.6666666665, ans=0.1 2023-11-26 05:10:22,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3234160.0, ans=0.04949747468305833 2023-11-26 05:10:26,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3234226.6666666665, ans=0.125 2023-11-26 05:10:30,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.624e+01 9.472e+01 1.019e+02 1.478e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 05:10:32,258 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:10:41,377 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-26 05:10:41,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234293.3333333335, ans=0.1 2023-11-26 05:10:45,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3234293.3333333335, ans=0.0 2023-11-26 05:10:48,110 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4200, loss[loss=0.08434, simple_loss=0.1132, pruned_loss=0.01985, audio_tagging_loss=0.007885, over 15677.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09144, pruned_loss=0.01274, audio_tagging_loss=0.008748, over 3055675.14 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:10:48,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3234360.0, ans=0.125 2023-11-26 05:10:51,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3234360.0, ans=0.125 2023-11-26 05:10:55,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3234360.0, ans=0.1 2023-11-26 05:11:04,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3234426.6666666665, ans=0.0 2023-11-26 05:11:04,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234426.6666666665, ans=0.1 2023-11-26 05:11:11,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3234493.3333333335, ans=0.0 2023-11-26 05:11:17,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3234493.3333333335, ans=0.2 2023-11-26 05:11:27,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3234560.0, ans=0.1 2023-11-26 05:11:37,143 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-26 05:11:43,692 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4250, loss[loss=0.05212, simple_loss=0.06172, pruned_loss=0.008825, audio_tagging_loss=0.01243, over 15166.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09148, pruned_loss=0.0126, audio_tagging_loss=0.008819, over 3062555.16 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:11:57,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3234760.0, ans=0.125 2023-11-26 05:11:58,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3234760.0, ans=0.125 2023-11-26 05:12:11,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234826.6666666665, ans=0.1 2023-11-26 05:12:21,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.763e+01 9.377e+01 1.004e+02 4.197e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 05:12:29,426 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:12:33,019 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-26 05:12:37,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3234960.0, ans=0.025 2023-11-26 05:12:39,367 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4300, loss[loss=0.06434, simple_loss=0.08995, pruned_loss=0.01126, audio_tagging_loss=0.008103, over 14787.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09095, pruned_loss=0.0125, audio_tagging_loss=0.008792, over 3063505.29 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:12:46,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3235026.6666666665, ans=0.1 2023-11-26 05:13:04,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=12.0 2023-11-26 05:13:08,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-26 05:13:10,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3235160.0, ans=0.125 2023-11-26 05:13:16,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-26 05:13:26,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3235293.3333333335, ans=0.0 2023-11-26 05:13:28,997 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-26 05:13:35,808 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4350, loss[loss=0.06301, simple_loss=0.08722, pruned_loss=0.01096, audio_tagging_loss=0.008438, over 15963.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09082, pruned_loss=0.01243, audio_tagging_loss=0.008771, over 3054129.53 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:13:54,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3235426.6666666665, ans=0.1 2023-11-26 05:14:04,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3235493.3333333335, ans=0.1 2023-11-26 05:14:13,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3235560.0, ans=0.04949747468305833 2023-11-26 05:14:13,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3235560.0, ans=0.125 2023-11-26 05:14:14,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.639e+01 9.414e+01 1.000e+02 1.262e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 05:14:25,137 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-26 05:14:26,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3235626.6666666665, ans=0.125 2023-11-26 05:14:31,452 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4400, loss[loss=0.06635, simple_loss=0.08853, pruned_loss=0.01374, audio_tagging_loss=0.008344, over 14495.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09079, pruned_loss=0.01257, audio_tagging_loss=0.008727, over 3053400.10 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:14:34,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3235693.3333333335, ans=0.0 2023-11-26 05:14:43,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=22.5 2023-11-26 05:15:02,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3235826.6666666665, ans=0.2 2023-11-26 05:15:06,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3235893.3333333335, ans=0.5 2023-11-26 05:15:19,901 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-26 05:15:22,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3235960.0, ans=0.0 2023-11-26 05:15:23,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3235960.0, ans=0.125 2023-11-26 05:15:27,070 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4450, loss[loss=0.08051, simple_loss=0.1142, pruned_loss=0.01397, audio_tagging_loss=0.009448, over 15301.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09083, pruned_loss=0.01262, audio_tagging_loss=0.008657, over 3058089.84 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:15:47,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3236093.3333333335, ans=0.2 2023-11-26 05:15:51,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=12.0 2023-11-26 05:16:06,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.912e+01 9.547e+01 1.021e+02 1.319e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 05:16:16,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-26 05:16:19,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3236293.3333333335, ans=0.125 2023-11-26 05:16:23,543 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4500, loss[loss=0.06075, simple_loss=0.07353, pruned_loss=0.0133, audio_tagging_loss=0.01068, over 14748.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09163, pruned_loss=0.01266, audio_tagging_loss=0.008612, over 3054426.39 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:16:26,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3236360.0, ans=0.125 2023-11-26 05:16:35,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3236426.6666666665, ans=0.125 2023-11-26 05:16:56,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3236560.0, ans=0.1 2023-11-26 05:16:56,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3236560.0, ans=0.125 2023-11-26 05:16:59,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3236560.0, ans=0.1 2023-11-26 05:17:06,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3236560.0, ans=0.0 2023-11-26 05:17:06,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-11-26 05:17:12,569 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-26 05:17:12,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3236626.6666666665, ans=0.125 2023-11-26 05:17:15,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3236626.6666666665, ans=0.0 2023-11-26 05:17:15,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3236626.6666666665, ans=0.125 2023-11-26 05:17:18,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3236693.3333333335, ans=0.1 2023-11-26 05:17:19,340 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4550, loss[loss=0.05671, simple_loss=0.0846, pruned_loss=0.006653, audio_tagging_loss=0.007755, over 15411.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09144, pruned_loss=0.01259, audio_tagging_loss=0.008571, over 3055667.01 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:17:29,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3236760.0, ans=0.125 2023-11-26 05:17:30,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-26 05:17:33,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-26 05:17:34,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-11-26 05:17:39,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3236760.0, ans=0.025 2023-11-26 05:17:39,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.59 vs. limit=10.0 2023-11-26 05:17:41,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-26 05:17:57,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.528e+01 9.112e+01 9.671e+01 1.236e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 05:18:00,022 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:18:08,211 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-26 05:18:15,119 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4600, loss[loss=0.06504, simple_loss=0.09238, pruned_loss=0.01185, audio_tagging_loss=0.006995, over 14935.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09082, pruned_loss=0.01251, audio_tagging_loss=0.008753, over 3054438.43 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:18:36,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.13 vs. limit=22.5 2023-11-26 05:18:54,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-26 05:19:00,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3237293.3333333335, ans=0.125 2023-11-26 05:19:03,803 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-26 05:19:10,777 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4650, loss[loss=0.06671, simple_loss=0.09145, pruned_loss=0.01204, audio_tagging_loss=0.008945, over 14713.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09107, pruned_loss=0.01269, audio_tagging_loss=0.008853, over 3054218.02 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:19:20,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3237360.0, ans=0.125 2023-11-26 05:19:35,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3237493.3333333335, ans=0.125 2023-11-26 05:19:43,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3237560.0, ans=0.0 2023-11-26 05:19:44,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3237560.0, ans=0.2 2023-11-26 05:19:45,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3237560.0, ans=0.125 2023-11-26 05:19:51,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.706e+01 9.399e+01 1.022e+02 1.601e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 05:19:59,981 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-26 05:20:01,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3237626.6666666665, ans=0.125 2023-11-26 05:20:06,206 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4700, loss[loss=0.07619, simple_loss=0.101, pruned_loss=0.01548, audio_tagging_loss=0.01019, over 14050.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09117, pruned_loss=0.01279, audio_tagging_loss=0.008963, over 3049451.98 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:20:06,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3237693.3333333335, ans=0.2 2023-11-26 05:20:27,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3237826.6666666665, ans=0.07 2023-11-26 05:20:40,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-11-26 05:20:43,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3237893.3333333335, ans=0.125 2023-11-26 05:20:54,760 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-26 05:21:02,122 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4750, loss[loss=0.08613, simple_loss=0.1207, pruned_loss=0.01811, audio_tagging_loss=0.00767, over 15822.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09096, pruned_loss=0.01258, audio_tagging_loss=0.008959, over 3042542.58 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:21:09,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3238026.6666666665, ans=0.0 2023-11-26 05:21:22,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3238093.3333333335, ans=0.0 2023-11-26 05:21:36,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3238226.6666666665, ans=0.0 2023-11-26 05:21:42,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.672e+01 9.207e+01 9.886e+01 1.229e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 05:21:50,945 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-26 05:21:52,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3238293.3333333335, ans=0.125 2023-11-26 05:21:57,739 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4800, loss[loss=0.05409, simple_loss=0.06053, pruned_loss=0.01121, audio_tagging_loss=0.01261, over 14588.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09117, pruned_loss=0.01275, audio_tagging_loss=0.009073, over 3041511.11 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:21:59,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2023-11-26 05:22:01,255 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:22:08,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3238426.6666666665, ans=0.125 2023-11-26 05:22:10,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3238426.6666666665, ans=0.2 2023-11-26 05:22:13,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3238426.6666666665, ans=0.0 2023-11-26 05:22:33,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3238560.0, ans=0.0 2023-11-26 05:22:36,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3238560.0, ans=0.0 2023-11-26 05:22:46,908 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-26 05:22:53,459 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4850, loss[loss=0.08075, simple_loss=0.1107, pruned_loss=0.01885, audio_tagging_loss=0.006553, over 15239.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09119, pruned_loss=0.01273, audio_tagging_loss=0.009154, over 3047532.47 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:23:10,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=22.5 2023-11-26 05:23:32,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3238893.3333333335, ans=0.0 2023-11-26 05:23:33,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2023-11-26 05:23:35,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.610e+01 9.289e+01 1.009e+02 1.598e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 05:23:36,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3238960.0, ans=0.125 2023-11-26 05:23:37,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3238960.0, ans=0.125 2023-11-26 05:23:41,951 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-26 05:23:48,249 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4900, loss[loss=0.06615, simple_loss=0.09169, pruned_loss=0.01188, audio_tagging_loss=0.008424, over 14593.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09091, pruned_loss=0.01265, audio_tagging_loss=0.009115, over 3044811.20 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:23:48,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-11-26 05:24:05,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-26 05:24:07,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3239093.3333333335, ans=0.125 2023-11-26 05:24:15,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-26 05:24:18,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-26 05:24:29,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3239226.6666666665, ans=0.125 2023-11-26 05:24:30,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3239226.6666666665, ans=0.0 2023-11-26 05:24:37,349 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-26 05:24:43,568 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4950, loss[loss=0.0705, simple_loss=0.09298, pruned_loss=0.01772, audio_tagging_loss=0.006286, over 16205.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08983, pruned_loss=0.01244, audio_tagging_loss=0.009076, over 3039063.76 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:24:46,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3239360.0, ans=0.0 2023-11-26 05:24:47,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3239360.0, ans=0.0 2023-11-26 05:24:49,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3239360.0, ans=0.125 2023-11-26 05:24:54,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3239426.6666666665, ans=0.015 2023-11-26 05:25:10,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=15.0 2023-11-26 05:25:15,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3239560.0, ans=0.125 2023-11-26 05:25:17,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3239560.0, ans=0.0 2023-11-26 05:25:25,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.689e+01 9.233e+01 9.794e+01 1.211e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 05:25:33,272 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-26 05:25:39,620 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5000, loss[loss=0.05951, simple_loss=0.0816, pruned_loss=0.01094, audio_tagging_loss=0.00777, over 15952.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08986, pruned_loss=0.01244, audio_tagging_loss=0.008895, over 3040797.16 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:25:53,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3239760.0, ans=0.5 2023-11-26 05:26:05,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3239826.6666666665, ans=0.125 2023-11-26 05:26:06,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3239826.6666666665, ans=0.0 2023-11-26 05:26:07,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3239826.6666666665, ans=0.125 2023-11-26 05:26:07,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3239826.6666666665, ans=0.0 2023-11-26 05:26:12,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3239893.3333333335, ans=0.0 2023-11-26 05:26:16,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3239893.3333333335, ans=0.125 2023-11-26 05:26:23,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3239960.0, ans=0.125 2023-11-26 05:26:28,158 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-26 05:26:34,643 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5050, loss[loss=0.0964, simple_loss=0.1362, pruned_loss=0.01984, audio_tagging_loss=0.008453, over 16175.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09026, pruned_loss=0.0127, audio_tagging_loss=0.008793, over 3036013.23 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:26:44,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3240026.6666666665, ans=0.0 2023-11-26 05:27:12,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240226.6666666665, ans=0.1 2023-11-26 05:27:14,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3240226.6666666665, ans=0.2 2023-11-26 05:27:16,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 8.723e+01 9.210e+01 1.029e+02 1.181e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 05:27:23,670 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-26 05:27:30,372 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5100, loss[loss=0.0597, simple_loss=0.08598, pruned_loss=0.007624, audio_tagging_loss=0.009093, over 15973.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09155, pruned_loss=0.01289, audio_tagging_loss=0.008693, over 3036795.49 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:27:40,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=12.0 2023-11-26 05:27:45,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3240426.6666666665, ans=0.125 2023-11-26 05:27:51,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3240493.3333333335, ans=0.125 2023-11-26 05:27:53,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-11-26 05:28:19,372 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-26 05:28:25,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3240693.3333333335, ans=0.125 2023-11-26 05:28:26,091 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5150, loss[loss=0.07557, simple_loss=0.09926, pruned_loss=0.0125, audio_tagging_loss=0.01343, over 15680.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08987, pruned_loss=0.0125, audio_tagging_loss=0.008757, over 3042939.45 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:28:32,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3240693.3333333335, ans=0.2 2023-11-26 05:28:48,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3240826.6666666665, ans=0.2 2023-11-26 05:29:00,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3240893.3333333335, ans=0.0 2023-11-26 05:29:02,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3240893.3333333335, ans=0.1 2023-11-26 05:29:08,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.813e+01 9.450e+01 1.017e+02 1.282e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 05:29:09,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3240960.0, ans=0.2 2023-11-26 05:29:14,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-26 05:29:21,073 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5200, loss[loss=0.05475, simple_loss=0.06994, pruned_loss=0.01034, audio_tagging_loss=0.009446, over 14986.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09015, pruned_loss=0.0126, audio_tagging_loss=0.008653, over 3042158.25 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:29:23,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3241026.6666666665, ans=0.125 2023-11-26 05:29:28,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3241026.6666666665, ans=0.025 2023-11-26 05:29:36,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3241093.3333333335, ans=0.125 2023-11-26 05:29:40,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-11-26 05:30:03,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3241226.6666666665, ans=0.0 2023-11-26 05:30:10,297 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-26 05:30:11,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-26 05:30:16,781 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5250, loss[loss=0.05206, simple_loss=0.07485, pruned_loss=0.007102, audio_tagging_loss=0.007531, over 15096.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08993, pruned_loss=0.01254, audio_tagging_loss=0.008669, over 3039991.10 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:30:19,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3241360.0, ans=15.0 2023-11-26 05:30:24,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2023-11-26 05:30:27,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3241426.6666666665, ans=0.125 2023-11-26 05:30:49,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3241560.0, ans=0.0 2023-11-26 05:30:58,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.725e+01 9.409e+01 1.008e+02 1.630e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 05:31:05,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-26 05:31:06,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2023-11-26 05:31:13,301 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5300, loss[loss=0.09943, simple_loss=0.1394, pruned_loss=0.02253, audio_tagging_loss=0.00718, over 15279.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09066, pruned_loss=0.01275, audio_tagging_loss=0.008618, over 3037617.55 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:31:13,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3241693.3333333335, ans=0.05 2023-11-26 05:31:14,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3241693.3333333335, ans=0.0 2023-11-26 05:31:19,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2023-11-26 05:31:20,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3241693.3333333335, ans=0.0 2023-11-26 05:31:38,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3241826.6666666665, ans=0.125 2023-11-26 05:31:43,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3241826.6666666665, ans=0.125 2023-11-26 05:31:49,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3241893.3333333335, ans=0.2 2023-11-26 05:31:50,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3241893.3333333335, ans=0.125 2023-11-26 05:32:01,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-26 05:32:08,106 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5350, loss[loss=0.06913, simple_loss=0.09238, pruned_loss=0.01664, audio_tagging_loss=0.006304, over 14517.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09015, pruned_loss=0.01263, audio_tagging_loss=0.00871, over 3037512.86 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:32:14,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3242026.6666666665, ans=0.125 2023-11-26 05:32:28,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3242160.0, ans=0.0 2023-11-26 05:32:33,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-26 05:32:38,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3242160.0, ans=0.125 2023-11-26 05:32:49,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.485e+01 9.147e+01 9.991e+01 1.214e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 05:32:56,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-26 05:33:00,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3242293.3333333335, ans=0.0 2023-11-26 05:33:01,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3242293.3333333335, ans=0.2 2023-11-26 05:33:03,203 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5400, loss[loss=0.07057, simple_loss=0.1027, pruned_loss=0.01173, audio_tagging_loss=0.007482, over 16002.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0891, pruned_loss=0.0124, audio_tagging_loss=0.008819, over 3034411.89 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:33:51,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-26 05:33:54,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3242626.6666666665, ans=0.125 2023-11-26 05:33:59,180 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5450, loss[loss=0.07658, simple_loss=0.1046, pruned_loss=0.01571, audio_tagging_loss=0.008565, over 15852.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08979, pruned_loss=0.01249, audio_tagging_loss=0.008891, over 3034270.22 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:34:00,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.35 vs. limit=10.0 2023-11-26 05:34:20,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3242826.6666666665, ans=0.95 2023-11-26 05:34:32,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3242893.3333333335, ans=0.125 2023-11-26 05:34:41,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.605e+01 9.179e+01 9.906e+01 1.952e+02, threshold=1.836e+02, percent-clipped=1.0 2023-11-26 05:34:42,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-26 05:34:43,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3242960.0, ans=0.2 2023-11-26 05:34:48,116 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-26 05:34:54,497 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5500, loss[loss=0.07017, simple_loss=0.09366, pruned_loss=0.01156, audio_tagging_loss=0.01177, over 15569.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08958, pruned_loss=0.01235, audio_tagging_loss=0.008993, over 3040307.48 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:35:07,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=22.5 2023-11-26 05:35:26,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3243226.6666666665, ans=0.0 2023-11-26 05:35:42,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-26 05:35:46,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3243293.3333333335, ans=0.125 2023-11-26 05:35:49,793 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5550, loss[loss=0.06233, simple_loss=0.08106, pruned_loss=0.01165, audio_tagging_loss=0.01015, over 15833.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09003, pruned_loss=0.01246, audio_tagging_loss=0.009067, over 3039112.30 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:35:52,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3243360.0, ans=0.1 2023-11-26 05:35:54,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3243360.0, ans=0.0 2023-11-26 05:35:56,568 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:35:59,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243426.6666666665, ans=0.1 2023-11-26 05:36:07,106 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:36:25,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243560.0, ans=0.1 2023-11-26 05:36:31,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3243560.0, ans=0.125 2023-11-26 05:36:32,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.745e+01 9.267e+01 1.002e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 05:36:38,522 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-26 05:36:39,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3243626.6666666665, ans=0.0 2023-11-26 05:36:45,268 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5600, loss[loss=0.05275, simple_loss=0.06744, pruned_loss=0.009561, audio_tagging_loss=0.00947, over 14693.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.0895, pruned_loss=0.01234, audio_tagging_loss=0.009084, over 3036883.15 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:36:46,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3243693.3333333335, ans=0.2 2023-11-26 05:37:06,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2023-11-26 05:37:23,715 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:37:34,268 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-26 05:37:40,749 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5650, loss[loss=0.05621, simple_loss=0.05897, pruned_loss=0.01105, audio_tagging_loss=0.01568, over 14984.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08921, pruned_loss=0.01232, audio_tagging_loss=0.009197, over 3039957.62 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:37:46,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3244026.6666666665, ans=0.125 2023-11-26 05:38:23,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.720e+01 9.280e+01 9.877e+01 1.364e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 05:38:29,538 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-26 05:38:35,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-26 05:38:35,771 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5700, loss[loss=0.06295, simple_loss=0.08515, pruned_loss=0.01021, audio_tagging_loss=0.01016, over 14759.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08943, pruned_loss=0.01238, audio_tagging_loss=0.00914, over 3037962.48 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:38:38,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3244360.0, ans=0.0 2023-11-26 05:38:39,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3244360.0, ans=0.0 2023-11-26 05:38:49,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3244426.6666666665, ans=0.1 2023-11-26 05:38:59,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3244493.3333333335, ans=0.2 2023-11-26 05:39:03,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3244493.3333333335, ans=0.0 2023-11-26 05:39:14,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3244560.0, ans=0.125 2023-11-26 05:39:24,749 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-26 05:39:26,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3244626.6666666665, ans=0.2 2023-11-26 05:39:27,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3244626.6666666665, ans=0.125 2023-11-26 05:39:31,508 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5750, loss[loss=0.06, simple_loss=0.08541, pruned_loss=0.009157, audio_tagging_loss=0.008136, over 15270.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08932, pruned_loss=0.01228, audio_tagging_loss=0.009049, over 3042811.39 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:39:31,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-26 05:40:12,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3244893.3333333335, ans=0.0 2023-11-26 05:40:12,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-26 05:40:15,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.613e+01 9.170e+01 1.044e+02 1.478e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 05:40:15,944 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:40:20,561 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-26 05:40:26,820 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5800, loss[loss=0.04859, simple_loss=0.06798, pruned_loss=0.008557, audio_tagging_loss=0.006039, over 15404.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08951, pruned_loss=0.01232, audio_tagging_loss=0.008975, over 3040718.86 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:40:30,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3245026.6666666665, ans=15.0 2023-11-26 05:40:31,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3245026.6666666665, ans=0.05 2023-11-26 05:40:47,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2023-11-26 05:41:15,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-26 05:41:21,968 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5850, loss[loss=0.07404, simple_loss=0.1075, pruned_loss=0.0109, audio_tagging_loss=0.009412, over 14505.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08988, pruned_loss=0.01245, audio_tagging_loss=0.008832, over 3040937.94 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:41:24,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3245360.0, ans=0.0 2023-11-26 05:41:46,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3245493.3333333335, ans=0.1 2023-11-26 05:41:55,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3245560.0, ans=0.125 2023-11-26 05:42:06,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.540e+01 9.221e+01 1.014e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 05:42:11,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-26 05:42:17,868 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5900, loss[loss=0.0664, simple_loss=0.08287, pruned_loss=0.01716, audio_tagging_loss=0.007813, over 14485.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09014, pruned_loss=0.01247, audio_tagging_loss=0.0088, over 3042183.19 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:42:22,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3245693.3333333335, ans=0.1 2023-11-26 05:42:31,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3245760.0, ans=0.125 2023-11-26 05:43:02,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3245960.0, ans=0.125 2023-11-26 05:43:06,748 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-26 05:43:13,575 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5950, loss[loss=0.05244, simple_loss=0.06376, pruned_loss=0.0101, audio_tagging_loss=0.01046, over 15517.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09042, pruned_loss=0.0126, audio_tagging_loss=0.008748, over 3040343.49 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:43:20,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3246026.6666666665, ans=0.125 2023-11-26 05:43:35,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3246160.0, ans=0.125 2023-11-26 05:43:49,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3246226.6666666665, ans=0.125 2023-11-26 05:43:57,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.522e+01 9.337e+01 1.020e+02 1.344e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:44:02,127 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-26 05:44:05,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3246293.3333333335, ans=0.2 2023-11-26 05:44:08,298 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6000, loss[loss=0.05904, simple_loss=0.07775, pruned_loss=0.01087, audio_tagging_loss=0.009287, over 15619.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08993, pruned_loss=0.01239, audio_tagging_loss=0.008742, over 3042272.03 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:44:08,299 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 05:44:40,559 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05752, simple_loss=0.0506, pruned_loss=0.005164, audio_tagging_loss=0.02705, over 4681554.00 frames. 2023-11-26 05:44:40,559 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 05:44:45,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3246360.0, ans=0.125 2023-11-26 05:45:07,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246493.3333333335, ans=0.1 2023-11-26 05:45:09,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2023-11-26 05:45:15,879 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:45:20,426 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:45:29,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-26 05:45:33,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3246626.6666666665, ans=0.125 2023-11-26 05:45:36,453 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6050, loss[loss=0.06655, simple_loss=0.08997, pruned_loss=0.01198, audio_tagging_loss=0.009577, over 15758.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08996, pruned_loss=0.01236, audio_tagging_loss=0.008755, over 3042544.68 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:45:42,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-26 05:45:43,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-26 05:45:44,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3246693.3333333335, ans=0.05 2023-11-26 05:45:47,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.09 vs. limit=5.0 2023-11-26 05:45:48,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3246760.0, ans=0.2 2023-11-26 05:45:52,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-26 05:46:01,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3246826.6666666665, ans=0.125 2023-11-26 05:46:15,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246893.3333333335, ans=0.1 2023-11-26 05:46:20,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3246960.0, ans=0.2 2023-11-26 05:46:21,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.605e+01 9.174e+01 9.669e+01 1.333e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 05:46:25,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-26 05:46:32,120 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6100, loss[loss=0.05232, simple_loss=0.07313, pruned_loss=0.007999, audio_tagging_loss=0.007757, over 15545.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09047, pruned_loss=0.01255, audio_tagging_loss=0.008689, over 3045133.40 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:46:33,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2023-11-26 05:46:47,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3247093.3333333335, ans=0.0 2023-11-26 05:46:51,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3247093.3333333335, ans=0.125 2023-11-26 05:46:53,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.00 vs. limit=10.0 2023-11-26 05:47:06,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3247226.6666666665, ans=0.125 2023-11-26 05:47:21,770 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-26 05:47:28,040 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6150, loss[loss=0.07428, simple_loss=0.1043, pruned_loss=0.01644, audio_tagging_loss=0.00571, over 14483.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08998, pruned_loss=0.01252, audio_tagging_loss=0.008808, over 3045968.42 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:47:37,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-26 05:47:43,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3247426.6666666665, ans=0.125 2023-11-26 05:47:46,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3247426.6666666665, ans=0.09899494936611666 2023-11-26 05:47:48,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247426.6666666665, ans=0.1 2023-11-26 05:47:51,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3247493.3333333335, ans=0.0 2023-11-26 05:47:51,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3247493.3333333335, ans=0.125 2023-11-26 05:48:06,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3247560.0, ans=0.0 2023-11-26 05:48:07,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-26 05:48:10,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3247560.0, ans=0.07 2023-11-26 05:48:12,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.728e+01 9.335e+01 1.012e+02 1.245e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:48:17,906 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-26 05:48:20,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247626.6666666665, ans=0.1 2023-11-26 05:48:22,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.92 vs. limit=10.0 2023-11-26 05:48:24,209 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6200, loss[loss=0.07752, simple_loss=0.1087, pruned_loss=0.01816, audio_tagging_loss=0.005031, over 14697.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09021, pruned_loss=0.01261, audio_tagging_loss=0.008793, over 3041822.88 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:48:25,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247693.3333333335, ans=0.1 2023-11-26 05:48:44,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3247760.0, ans=10.0 2023-11-26 05:48:49,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-26 05:48:58,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-26 05:49:03,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-26 05:49:10,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3247960.0, ans=0.125 2023-11-26 05:49:11,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2023-11-26 05:49:13,229 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-26 05:49:13,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3247960.0, ans=0.07 2023-11-26 05:49:19,824 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6250, loss[loss=0.07231, simple_loss=0.08375, pruned_loss=0.0174, audio_tagging_loss=0.01303, over 14021.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08959, pruned_loss=0.01257, audio_tagging_loss=0.008921, over 3041832.77 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:49:20,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3248026.6666666665, ans=0.125 2023-11-26 05:49:32,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2023-11-26 05:49:41,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3248160.0, ans=0.2 2023-11-26 05:49:42,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3248160.0, ans=0.0 2023-11-26 05:49:46,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3248160.0, ans=0.025 2023-11-26 05:49:54,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3248226.6666666665, ans=0.125 2023-11-26 05:50:04,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.628e+01 9.158e+01 1.005e+02 1.454e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 05:50:06,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3248293.3333333335, ans=0.1 2023-11-26 05:50:08,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-26 05:50:14,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3248360.0, ans=0.5 2023-11-26 05:50:15,171 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6300, loss[loss=0.0759, simple_loss=0.1097, pruned_loss=0.01354, audio_tagging_loss=0.007529, over 15151.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08956, pruned_loss=0.01261, audio_tagging_loss=0.008993, over 3040066.41 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:50:25,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3248360.0, ans=0.1 2023-11-26 05:50:31,779 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:51:03,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3248626.6666666665, ans=0.5 2023-11-26 05:51:04,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-26 05:51:07,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3248626.6666666665, ans=0.125 2023-11-26 05:51:11,868 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6350, loss[loss=0.06088, simple_loss=0.07521, pruned_loss=0.0126, audio_tagging_loss=0.01067, over 15259.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0899, pruned_loss=0.01266, audio_tagging_loss=0.009005, over 3045553.07 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:51:27,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3248760.0, ans=0.05 2023-11-26 05:51:32,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3248826.6666666665, ans=0.2 2023-11-26 05:51:35,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3248826.6666666665, ans=0.025 2023-11-26 05:51:37,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3248826.6666666665, ans=0.125 2023-11-26 05:51:38,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=22.5 2023-11-26 05:51:39,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3248826.6666666665, ans=0.0 2023-11-26 05:51:40,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3248826.6666666665, ans=0.125 2023-11-26 05:51:43,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3248893.3333333335, ans=0.125 2023-11-26 05:51:50,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248893.3333333335, ans=0.1 2023-11-26 05:51:56,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.555e+01 9.166e+01 9.747e+01 1.455e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 05:52:00,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-26 05:52:05,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3248960.0, ans=0.125 2023-11-26 05:52:06,939 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6400, loss[loss=0.07067, simple_loss=0.09913, pruned_loss=0.01219, audio_tagging_loss=0.00892, over 15197.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08919, pruned_loss=0.01255, audio_tagging_loss=0.009129, over 3042252.11 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:52:22,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3249093.3333333335, ans=0.0 2023-11-26 05:52:48,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3249226.6666666665, ans=0.125 2023-11-26 05:52:55,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-26 05:52:58,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2023-11-26 05:53:02,870 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6450, loss[loss=0.07372, simple_loss=0.1044, pruned_loss=0.01355, audio_tagging_loss=0.007977, over 14065.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08943, pruned_loss=0.01253, audio_tagging_loss=0.009247, over 3035205.69 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:53:06,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2023-11-26 05:53:08,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3249360.0, ans=0.2 2023-11-26 05:53:12,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3249360.0, ans=0.125 2023-11-26 05:53:34,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=12.0 2023-11-26 05:53:47,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.690e+01 9.179e+01 1.001e+02 1.387e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 05:53:52,271 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-26 05:53:59,115 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6500, loss[loss=0.05836, simple_loss=0.082, pruned_loss=0.008037, audio_tagging_loss=0.00933, over 14621.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08958, pruned_loss=0.01259, audio_tagging_loss=0.009209, over 3031184.78 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:54:07,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-26 05:54:11,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-11-26 05:54:21,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3249826.6666666665, ans=0.125 2023-11-26 05:54:41,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-26 05:54:43,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3249960.0, ans=0.125 2023-11-26 05:54:48,358 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-26 05:54:54,641 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6550, loss[loss=0.07649, simple_loss=0.1037, pruned_loss=0.01473, audio_tagging_loss=0.009906, over 14258.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08881, pruned_loss=0.0125, audio_tagging_loss=0.009071, over 3030541.71 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:55:34,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3250226.6666666665, ans=0.04949747468305833 2023-11-26 05:55:39,376 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.523e+01 8.995e+01 9.830e+01 1.214e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-26 05:55:43,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-26 05:55:50,076 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6600, loss[loss=0.07042, simple_loss=0.1108, pruned_loss=0.01037, audio_tagging_loss=0.004636, over 16107.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08877, pruned_loss=0.0123, audio_tagging_loss=0.008876, over 3026338.93 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:56:12,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-26 05:56:34,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2023-11-26 05:56:38,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3250626.6666666665, ans=0.0 2023-11-26 05:56:40,104 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-26 05:56:40,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3250626.6666666665, ans=0.125 2023-11-26 05:56:47,247 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6650, loss[loss=0.05737, simple_loss=0.07633, pruned_loss=0.008669, audio_tagging_loss=0.01054, over 15485.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08953, pruned_loss=0.01245, audio_tagging_loss=0.008731, over 3030582.80 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:56:57,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3250760.0, ans=0.1 2023-11-26 05:57:10,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3250826.6666666665, ans=0.0 2023-11-26 05:57:24,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3250893.3333333335, ans=0.125 2023-11-26 05:57:32,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.660e+01 9.061e+01 9.694e+01 1.150e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 05:57:36,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-26 05:57:40,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3250960.0, ans=0.025 2023-11-26 05:57:42,719 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6700, loss[loss=0.05302, simple_loss=0.07649, pruned_loss=0.00634, audio_tagging_loss=0.008434, over 15316.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08895, pruned_loss=0.01229, audio_tagging_loss=0.0087, over 3029313.98 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:57:42,958 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:57:53,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3251093.3333333335, ans=0.0 2023-11-26 05:58:05,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3251160.0, ans=0.09899494936611666 2023-11-26 05:58:13,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3251160.0, ans=0.1 2023-11-26 05:58:20,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3251226.6666666665, ans=0.125 2023-11-26 05:58:32,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-26 05:58:32,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3251293.3333333335, ans=0.125 2023-11-26 05:58:38,292 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6750, loss[loss=0.09, simple_loss=0.1207, pruned_loss=0.01951, audio_tagging_loss=0.01016, over 15133.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08924, pruned_loss=0.01225, audio_tagging_loss=0.008721, over 3027444.75 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:58:43,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3251360.0, ans=0.125 2023-11-26 05:59:00,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3251493.3333333335, ans=0.0 2023-11-26 05:59:08,715 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:59:24,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.663e+01 9.356e+01 1.018e+02 1.599e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 05:59:27,707 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-26 05:59:34,848 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6800, loss[loss=0.05792, simple_loss=0.07791, pruned_loss=0.01066, audio_tagging_loss=0.008306, over 14832.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08926, pruned_loss=0.01218, audio_tagging_loss=0.008802, over 3027696.62 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:59:42,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3251693.3333333335, ans=0.125 2023-11-26 06:00:24,533 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-26 06:00:31,087 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6850, loss[loss=0.06437, simple_loss=0.08572, pruned_loss=0.01259, audio_tagging_loss=0.008921, over 15743.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08965, pruned_loss=0.01216, audio_tagging_loss=0.008813, over 3032329.31 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:00:48,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3252093.3333333335, ans=0.125 2023-11-26 06:01:15,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3252293.3333333335, ans=0.0 2023-11-26 06:01:16,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.578e+01 9.183e+01 9.945e+01 1.364e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 06:01:19,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-26 06:01:21,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3252293.3333333335, ans=0.2 2023-11-26 06:01:26,608 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6900, loss[loss=0.06039, simple_loss=0.07765, pruned_loss=0.01335, audio_tagging_loss=0.00822, over 14561.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08827, pruned_loss=0.01188, audio_tagging_loss=0.008817, over 3033568.40 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:01:41,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3252426.6666666665, ans=0.025 2023-11-26 06:01:50,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3252493.3333333335, ans=0.5 2023-11-26 06:01:58,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3252493.3333333335, ans=0.125 2023-11-26 06:02:10,247 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:02:12,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3252626.6666666665, ans=0.125 2023-11-26 06:02:16,123 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-26 06:02:22,933 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6950, loss[loss=0.0734, simple_loss=0.1111, pruned_loss=0.01184, audio_tagging_loss=0.006033, over 15871.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08896, pruned_loss=0.01206, audio_tagging_loss=0.008756, over 3031397.35 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:02:32,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.76 vs. limit=10.0 2023-11-26 06:02:54,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3252893.3333333335, ans=0.125 2023-11-26 06:03:10,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.822e+01 9.326e+01 1.010e+02 2.073e+02, threshold=1.865e+02, percent-clipped=1.0 2023-11-26 06:03:10,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2023-11-26 06:03:12,259 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-26 06:03:18,658 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7000, loss[loss=0.06287, simple_loss=0.08024, pruned_loss=0.01415, audio_tagging_loss=0.008607, over 13478.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0889, pruned_loss=0.01207, audio_tagging_loss=0.008781, over 3035372.93 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:03:27,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-26 06:03:29,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3253093.3333333335, ans=0.0 2023-11-26 06:03:33,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-26 06:03:40,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-26 06:03:49,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3253160.0, ans=0.125 2023-11-26 06:03:58,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3253226.6666666665, ans=0.0 2023-11-26 06:04:07,808 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-26 06:04:16,282 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7050, loss[loss=0.05897, simple_loss=0.07314, pruned_loss=0.01417, audio_tagging_loss=0.008224, over 14521.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0899, pruned_loss=0.01234, audio_tagging_loss=0.008752, over 3039521.42 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:04:25,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3253360.0, ans=0.0 2023-11-26 06:04:34,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3253426.6666666665, ans=0.125 2023-11-26 06:04:36,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3253426.6666666665, ans=0.2 2023-11-26 06:04:38,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3253493.3333333335, ans=0.125 2023-11-26 06:05:02,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2023-11-26 06:05:02,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.684e+01 9.399e+01 1.022e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 06:05:02,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3253626.6666666665, ans=0.125 2023-11-26 06:05:05,276 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-26 06:05:11,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3253693.3333333335, ans=0.2 2023-11-26 06:05:12,710 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7100, loss[loss=0.0802, simple_loss=0.1094, pruned_loss=0.01671, audio_tagging_loss=0.008803, over 15111.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08947, pruned_loss=0.01258, audio_tagging_loss=0.008874, over 3044628.58 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:05:30,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3253760.0, ans=0.2 2023-11-26 06:05:38,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3253826.6666666665, ans=0.04949747468305833 2023-11-26 06:06:02,059 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-26 06:06:08,416 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7150, loss[loss=0.06514, simple_loss=0.08373, pruned_loss=0.01134, audio_tagging_loss=0.01194, over 14521.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08976, pruned_loss=0.01269, audio_tagging_loss=0.008912, over 3037956.49 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:06:15,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3254026.6666666665, ans=0.125 2023-11-26 06:06:26,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3254093.3333333335, ans=0.0 2023-11-26 06:06:29,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-26 06:06:30,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3254160.0, ans=0.07 2023-11-26 06:06:52,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3254293.3333333335, ans=0.125 2023-11-26 06:06:54,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.934e+01 9.396e+01 1.011e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 06:06:56,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-26 06:06:57,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3254293.3333333335, ans=0.125 2023-11-26 06:07:03,192 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7200, loss[loss=0.06257, simple_loss=0.07364, pruned_loss=0.01591, audio_tagging_loss=0.009842, over 14988.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09049, pruned_loss=0.0126, audio_tagging_loss=0.008892, over 3043173.10 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:07:23,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2023-11-26 06:07:23,955 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:07:38,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3254560.0, ans=0.07 2023-11-26 06:07:50,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3254626.6666666665, ans=0.0 2023-11-26 06:07:52,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-26 06:07:59,923 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7250, loss[loss=0.09503, simple_loss=0.129, pruned_loss=0.02429, audio_tagging_loss=0.006225, over 16381.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08964, pruned_loss=0.01247, audio_tagging_loss=0.008995, over 3045389.70 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:08:02,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3254693.3333333335, ans=0.0 2023-11-26 06:08:12,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3254760.0, ans=0.0 2023-11-26 06:08:13,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254760.0, ans=0.1 2023-11-26 06:08:19,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254760.0, ans=0.1 2023-11-26 06:08:23,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3254826.6666666665, ans=0.07 2023-11-26 06:08:34,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3254893.3333333335, ans=0.0 2023-11-26 06:08:38,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2023-11-26 06:08:41,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3254893.3333333335, ans=0.125 2023-11-26 06:08:47,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.575e+01 9.064e+01 9.788e+01 1.213e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-26 06:08:49,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-26 06:08:56,394 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7300, loss[loss=0.06023, simple_loss=0.09179, pruned_loss=0.007803, audio_tagging_loss=0.006538, over 14414.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09022, pruned_loss=0.01246, audio_tagging_loss=0.008857, over 3050790.33 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:08:57,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3255026.6666666665, ans=0.1 2023-11-26 06:08:58,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3255026.6666666665, ans=0.0 2023-11-26 06:09:06,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3255093.3333333335, ans=0.0 2023-11-26 06:09:45,392 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-26 06:09:51,722 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7350, loss[loss=0.06226, simple_loss=0.09146, pruned_loss=0.01156, audio_tagging_loss=0.004976, over 13849.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08969, pruned_loss=0.01238, audio_tagging_loss=0.008705, over 3051467.72 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:09:52,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2023-11-26 06:10:02,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3255426.6666666665, ans=0.125 2023-11-26 06:10:39,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.543e+01 9.108e+01 9.776e+01 1.189e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 06:10:40,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-26 06:10:47,657 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7400, loss[loss=0.05956, simple_loss=0.07875, pruned_loss=0.009898, audio_tagging_loss=0.01029, over 15650.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08885, pruned_loss=0.01215, audio_tagging_loss=0.008781, over 3049229.48 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:10:49,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3255693.3333333335, ans=0.1 2023-11-26 06:10:57,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-26 06:11:03,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3255760.0, ans=0.2 2023-11-26 06:11:07,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-11-26 06:11:09,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3255826.6666666665, ans=0.125 2023-11-26 06:11:18,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3255826.6666666665, ans=0.0 2023-11-26 06:11:21,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3255893.3333333335, ans=0.1 2023-11-26 06:11:28,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.84 vs. limit=10.0 2023-11-26 06:11:31,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3255960.0, ans=0.125 2023-11-26 06:11:31,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-26 06:11:37,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-26 06:11:44,997 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7450, loss[loss=0.05938, simple_loss=0.08311, pruned_loss=0.01019, audio_tagging_loss=0.00764, over 16282.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08825, pruned_loss=0.01223, audio_tagging_loss=0.008849, over 3045514.48 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:11:49,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3256026.6666666665, ans=0.0 2023-11-26 06:11:50,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3256026.6666666665, ans=0.015 2023-11-26 06:11:50,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3256026.6666666665, ans=0.125 2023-11-26 06:12:30,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3256293.3333333335, ans=0.025 2023-11-26 06:12:32,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.793e+01 9.296e+01 1.001e+02 1.337e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 06:12:33,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-26 06:12:39,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3256360.0, ans=0.125 2023-11-26 06:12:39,878 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7500, loss[loss=0.05309, simple_loss=0.07197, pruned_loss=0.007953, audio_tagging_loss=0.009154, over 15589.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08896, pruned_loss=0.0123, audio_tagging_loss=0.008836, over 3049061.09 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:12:42,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-11-26 06:12:55,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3256426.6666666665, ans=0.1 2023-11-26 06:13:21,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3256560.0, ans=0.0 2023-11-26 06:13:29,028 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-26 06:13:35,299 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7550, loss[loss=0.07883, simple_loss=0.1124, pruned_loss=0.01699, audio_tagging_loss=0.005624, over 15808.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08936, pruned_loss=0.01236, audio_tagging_loss=0.008821, over 3053174.38 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:14:07,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3256826.6666666665, ans=0.0 2023-11-26 06:14:14,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3256893.3333333335, ans=0.05 2023-11-26 06:14:23,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 9.000e+01 9.495e+01 1.038e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 06:14:25,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-26 06:14:26,217 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:14:31,446 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7600, loss[loss=0.08068, simple_loss=0.1144, pruned_loss=0.01767, audio_tagging_loss=0.005802, over 16469.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08895, pruned_loss=0.01241, audio_tagging_loss=0.008933, over 3060288.18 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:14:41,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3257026.6666666665, ans=0.125 2023-11-26 06:14:57,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3257160.0, ans=15.0 2023-11-26 06:14:59,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3257160.0, ans=0.0 2023-11-26 06:15:14,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3257226.6666666665, ans=0.1 2023-11-26 06:15:20,847 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-26 06:15:23,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3257293.3333333335, ans=0.125 2023-11-26 06:15:24,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3257293.3333333335, ans=0.0 2023-11-26 06:15:27,864 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7650, loss[loss=0.06127, simple_loss=0.08575, pruned_loss=0.009858, audio_tagging_loss=0.008535, over 15077.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08872, pruned_loss=0.01245, audio_tagging_loss=0.008867, over 3061177.80 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:15:28,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3257360.0, ans=0.0 2023-11-26 06:15:31,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3257360.0, ans=0.125 2023-11-26 06:15:37,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3257426.6666666665, ans=0.05 2023-11-26 06:15:48,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-26 06:15:53,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2023-11-26 06:16:08,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2023-11-26 06:16:16,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.718e+01 9.418e+01 1.004e+02 2.180e+02, threshold=1.884e+02, percent-clipped=1.0 2023-11-26 06:16:16,775 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-26 06:16:16,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3257626.6666666665, ans=0.0 2023-11-26 06:16:16,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3257626.6666666665, ans=0.0 2023-11-26 06:16:23,057 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7700, loss[loss=0.07661, simple_loss=0.1128, pruned_loss=0.01363, audio_tagging_loss=0.006585, over 15607.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0897, pruned_loss=0.01251, audio_tagging_loss=0.008729, over 3058850.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:16:26,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3257693.3333333335, ans=0.125 2023-11-26 06:16:53,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=15.0 2023-11-26 06:17:08,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-26 06:17:11,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-26 06:17:12,416 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-26 06:17:19,367 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7750, loss[loss=0.05763, simple_loss=0.07501, pruned_loss=0.01148, audio_tagging_loss=0.008645, over 14540.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.0902, pruned_loss=0.01266, audio_tagging_loss=0.00873, over 3061166.83 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:17:28,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3258026.6666666665, ans=0.125 2023-11-26 06:17:41,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-26 06:17:44,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3258160.0, ans=0.0 2023-11-26 06:18:00,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3258226.6666666665, ans=0.125 2023-11-26 06:18:08,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.599e+01 9.200e+01 9.734e+01 1.299e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 06:18:08,830 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-26 06:18:15,096 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7800, loss[loss=0.05282, simple_loss=0.06564, pruned_loss=0.008934, audio_tagging_loss=0.01106, over 15598.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09063, pruned_loss=0.01256, audio_tagging_loss=0.008747, over 3057327.94 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:18:30,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3258426.6666666665, ans=0.04949747468305833 2023-11-26 06:18:31,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258426.6666666665, ans=0.1 2023-11-26 06:18:54,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3258560.0, ans=0.0 2023-11-26 06:19:04,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-26 06:19:11,388 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7850, loss[loss=0.05482, simple_loss=0.06115, pruned_loss=0.01337, audio_tagging_loss=0.01087, over 15013.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08975, pruned_loss=0.01238, audio_tagging_loss=0.008881, over 3046590.89 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:19:12,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3258693.3333333335, ans=0.0 2023-11-26 06:19:23,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3258760.0, ans=0.125 2023-11-26 06:19:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3258760.0, ans=0.125 2023-11-26 06:19:29,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3258760.0, ans=0.125 2023-11-26 06:19:47,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3258893.3333333335, ans=0.125 2023-11-26 06:19:57,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3258960.0, ans=0.04949747468305833 2023-11-26 06:20:00,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.695e+01 9.194e+01 9.770e+01 1.489e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 06:20:00,855 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-26 06:20:01,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3258960.0, ans=0.1 2023-11-26 06:20:07,643 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7900, loss[loss=0.04486, simple_loss=0.05538, pruned_loss=0.007199, audio_tagging_loss=0.009971, over 14344.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08906, pruned_loss=0.01237, audio_tagging_loss=0.009022, over 3050254.58 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:20:09,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3259026.6666666665, ans=0.125 2023-11-26 06:20:10,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3259026.6666666665, ans=0.0 2023-11-26 06:20:31,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2023-11-26 06:20:41,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3259226.6666666665, ans=0.0 2023-11-26 06:20:57,367 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-26 06:21:03,738 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7950, loss[loss=0.05013, simple_loss=0.0638, pruned_loss=0.00823, audio_tagging_loss=0.01, over 13783.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0889, pruned_loss=0.01238, audio_tagging_loss=0.009114, over 3056337.20 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:21:07,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3259360.0, ans=0.0 2023-11-26 06:21:13,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-26 06:21:16,903 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:21:26,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3259493.3333333335, ans=0.125 2023-11-26 06:21:37,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3259560.0, ans=0.0 2023-11-26 06:21:52,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.753e+01 9.407e+01 1.023e+02 1.871e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-26 06:21:52,392 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-26 06:21:54,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 06:21:58,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-26 06:21:59,114 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8000, loss[loss=0.08389, simple_loss=0.1102, pruned_loss=0.01988, audio_tagging_loss=0.008889, over 15514.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.0891, pruned_loss=0.01259, audio_tagging_loss=0.009116, over 3049400.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:21:59,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3259693.3333333335, ans=0.0 2023-11-26 06:22:04,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3259693.3333333335, ans=0.125 2023-11-26 06:22:28,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3259826.6666666665, ans=0.0 2023-11-26 06:22:34,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3259893.3333333335, ans=0.0 2023-11-26 06:22:35,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-26 06:22:36,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3259893.3333333335, ans=0.1 2023-11-26 06:22:48,445 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-26 06:22:48,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3259960.0, ans=0.125 2023-11-26 06:22:50,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-26 06:22:54,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3260026.6666666665, ans=0.1 2023-11-26 06:22:55,046 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8050, loss[loss=0.06949, simple_loss=0.09913, pruned_loss=0.01143, audio_tagging_loss=0.008493, over 15149.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09009, pruned_loss=0.01278, audio_tagging_loss=0.009102, over 3054808.97 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:22:59,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3260026.6666666665, ans=0.1 2023-11-26 06:23:11,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3260093.3333333335, ans=0.125 2023-11-26 06:23:15,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3260093.3333333335, ans=0.0 2023-11-26 06:23:31,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-11-26 06:23:37,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3260226.6666666665, ans=0.09899494936611666 2023-11-26 06:23:43,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3260293.3333333335, ans=0.1 2023-11-26 06:23:44,635 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-26 06:23:44,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-26 06:23:46,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.810e+01 9.339e+01 9.946e+01 1.266e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 06:23:51,459 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8100, loss[loss=0.05053, simple_loss=0.06872, pruned_loss=0.006151, audio_tagging_loss=0.01002, over 14698.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09022, pruned_loss=0.01274, audio_tagging_loss=0.009043, over 3048582.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:23:51,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3260360.0, ans=0.0 2023-11-26 06:23:57,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3260360.0, ans=0.125 2023-11-26 06:24:05,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3260426.6666666665, ans=0.015 2023-11-26 06:24:08,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-26 06:24:26,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3260560.0, ans=0.125 2023-11-26 06:24:35,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3260626.6666666665, ans=0.0 2023-11-26 06:24:40,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-26 06:24:46,949 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8150, loss[loss=0.0608, simple_loss=0.08484, pruned_loss=0.01046, audio_tagging_loss=0.007926, over 16292.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.0904, pruned_loss=0.01277, audio_tagging_loss=0.008897, over 3049064.85 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:24:51,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3260693.3333333335, ans=0.1 2023-11-26 06:25:21,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3260893.3333333335, ans=0.125 2023-11-26 06:25:28,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3260893.3333333335, ans=0.05 2023-11-26 06:25:35,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-26 06:25:37,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.636e+01 9.236e+01 1.005e+02 1.829e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 06:25:43,406 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8200, loss[loss=0.08201, simple_loss=0.1191, pruned_loss=0.01522, audio_tagging_loss=0.007266, over 16010.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09103, pruned_loss=0.01286, audio_tagging_loss=0.008784, over 3048248.75 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 06:25:44,458 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:25:47,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=12.0 2023-11-26 06:26:00,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3261093.3333333335, ans=0.05 2023-11-26 06:26:00,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261093.3333333335, ans=0.1 2023-11-26 06:26:12,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3261160.0, ans=0.0 2023-11-26 06:26:18,971 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:26:33,278 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-26 06:26:40,314 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8250, loss[loss=0.05278, simple_loss=0.0659, pruned_loss=0.0112, audio_tagging_loss=0.00863, over 14192.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08974, pruned_loss=0.01266, audio_tagging_loss=0.008862, over 3045867.42 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:26:52,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3261426.6666666665, ans=0.125 2023-11-26 06:27:05,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3261493.3333333335, ans=0.125 2023-11-26 06:27:06,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3261493.3333333335, ans=0.0 2023-11-26 06:27:29,818 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-26 06:27:31,869 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 8.764e+01 9.523e+01 1.021e+02 1.378e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 06:27:36,103 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8300, loss[loss=0.08699, simple_loss=0.1198, pruned_loss=0.01894, audio_tagging_loss=0.008157, over 17639.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08907, pruned_loss=0.01254, audio_tagging_loss=0.008938, over 3051501.18 frames. ], batch size: 67, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:27:42,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3261693.3333333335, ans=0.0 2023-11-26 06:28:09,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3261893.3333333335, ans=0.125 2023-11-26 06:28:12,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3261893.3333333335, ans=0.09899494936611666 2023-11-26 06:28:13,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261893.3333333335, ans=0.1 2023-11-26 06:28:25,263 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-26 06:28:32,188 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8350, loss[loss=0.07005, simple_loss=0.09777, pruned_loss=0.01463, audio_tagging_loss=0.006535, over 14089.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08928, pruned_loss=0.01247, audio_tagging_loss=0.00886, over 3054244.27 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:29:21,871 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-26 06:29:22,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-11-26 06:29:23,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.707e+01 9.107e+01 9.856e+01 1.432e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 06:29:28,771 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8400, loss[loss=0.07336, simple_loss=0.09691, pruned_loss=0.01521, audio_tagging_loss=0.009693, over 16034.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08922, pruned_loss=0.01244, audio_tagging_loss=0.008914, over 3047624.23 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:29:38,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2023-11-26 06:29:48,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3262426.6666666665, ans=0.1 2023-11-26 06:29:48,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3262426.6666666665, ans=0.125 2023-11-26 06:30:13,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3262626.6666666665, ans=0.09899494936611666 2023-11-26 06:30:17,891 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-26 06:30:24,468 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8450, loss[loss=0.05346, simple_loss=0.06688, pruned_loss=0.009247, audio_tagging_loss=0.01077, over 15018.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08929, pruned_loss=0.01243, audio_tagging_loss=0.008944, over 3052203.57 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:30:24,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3262693.3333333335, ans=0.2 2023-11-26 06:30:28,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-26 06:30:28,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-26 06:30:34,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-26 06:30:35,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-26 06:30:37,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-26 06:30:43,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-26 06:30:52,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3262826.6666666665, ans=0.2 2023-11-26 06:30:57,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-26 06:31:02,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3262893.3333333335, ans=0.125 2023-11-26 06:31:13,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-26 06:31:13,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3262960.0, ans=0.2 2023-11-26 06:31:15,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.909e+01 9.451e+01 1.011e+02 1.331e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 06:31:20,191 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8500, loss[loss=0.06288, simple_loss=0.08003, pruned_loss=0.01411, audio_tagging_loss=0.00876, over 14738.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0893, pruned_loss=0.0124, audio_tagging_loss=0.008895, over 3053416.27 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:31:22,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3263026.6666666665, ans=0.125 2023-11-26 06:31:23,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2023-11-26 06:32:09,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-26 06:32:16,191 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8550, loss[loss=0.06484, simple_loss=0.08428, pruned_loss=0.01405, audio_tagging_loss=0.008657, over 15592.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08921, pruned_loss=0.01231, audio_tagging_loss=0.008951, over 3058233.98 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:32:19,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3263360.0, ans=0.125 2023-11-26 06:32:19,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3263360.0, ans=0.125 2023-11-26 06:32:45,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-26 06:32:45,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3263493.3333333335, ans=0.125 2023-11-26 06:33:05,641 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-26 06:33:07,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.883e+01 9.307e+01 9.956e+01 1.247e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 06:33:10,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3263626.6666666665, ans=0.05 2023-11-26 06:33:11,965 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8600, loss[loss=0.07542, simple_loss=0.1, pruned_loss=0.01736, audio_tagging_loss=0.008042, over 14792.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08982, pruned_loss=0.01234, audio_tagging_loss=0.008846, over 3060642.47 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:33:13,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3263693.3333333335, ans=10.0 2023-11-26 06:34:00,556 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-26 06:34:05,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3263960.0, ans=0.125 2023-11-26 06:34:07,034 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8650, loss[loss=0.08231, simple_loss=0.1136, pruned_loss=0.01773, audio_tagging_loss=0.007784, over 16606.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09088, pruned_loss=0.01252, audio_tagging_loss=0.008832, over 3054013.83 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:34:07,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3264026.6666666665, ans=0.125 2023-11-26 06:34:24,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3264093.3333333335, ans=0.125 2023-11-26 06:34:24,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3264093.3333333335, ans=0.125 2023-11-26 06:34:29,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3264160.0, ans=0.125 2023-11-26 06:34:35,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-26 06:34:38,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3264160.0, ans=0.125 2023-11-26 06:34:53,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3264293.3333333335, ans=0.0 2023-11-26 06:34:56,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-26 06:34:58,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.575e+01 9.501e+01 1.015e+02 1.798e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 06:35:03,407 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8700, loss[loss=0.06159, simple_loss=0.07578, pruned_loss=0.0129, audio_tagging_loss=0.0108, over 14731.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09098, pruned_loss=0.01259, audio_tagging_loss=0.008886, over 3051293.53 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:35:08,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3264360.0, ans=0.125 2023-11-26 06:35:10,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3264360.0, ans=0.1 2023-11-26 06:35:29,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3264493.3333333335, ans=0.125 2023-11-26 06:35:31,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3264493.3333333335, ans=0.125 2023-11-26 06:35:39,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3264560.0, ans=0.1 2023-11-26 06:35:50,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3264626.6666666665, ans=10.0 2023-11-26 06:35:52,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-26 06:35:59,777 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8750, loss[loss=0.0586, simple_loss=0.08194, pruned_loss=0.00951, audio_tagging_loss=0.008124, over 15793.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09141, pruned_loss=0.01256, audio_tagging_loss=0.008864, over 3052282.62 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:36:32,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2023-11-26 06:36:48,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-26 06:36:50,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.719e+01 9.577e+01 1.009e+02 1.331e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 06:36:55,058 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8800, loss[loss=0.06264, simple_loss=0.0883, pruned_loss=0.009932, audio_tagging_loss=0.008561, over 14689.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09177, pruned_loss=0.01268, audio_tagging_loss=0.00894, over 3062884.32 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:37:02,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2023-11-26 06:37:05,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3265093.3333333335, ans=0.0 2023-11-26 06:37:10,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3265093.3333333335, ans=0.0 2023-11-26 06:37:25,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3265160.0, ans=0.125 2023-11-26 06:37:43,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3265293.3333333335, ans=0.0 2023-11-26 06:37:44,510 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-26 06:37:51,522 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8850, loss[loss=0.07472, simple_loss=0.1015, pruned_loss=0.01573, audio_tagging_loss=0.008244, over 14811.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09133, pruned_loss=0.01258, audio_tagging_loss=0.009006, over 3053979.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:37:56,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-26 06:38:02,762 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:38:05,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3265426.6666666665, ans=0.125 2023-11-26 06:38:06,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3265426.6666666665, ans=0.2 2023-11-26 06:38:14,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3265493.3333333335, ans=15.0 2023-11-26 06:38:28,338 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:38:34,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-26 06:38:37,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=12.0 2023-11-26 06:38:40,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-26 06:38:44,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.713e+01 9.492e+01 1.007e+02 1.202e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 06:38:46,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3265693.3333333335, ans=0.07 2023-11-26 06:38:47,337 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8900, loss[loss=0.07859, simple_loss=0.1049, pruned_loss=0.01651, audio_tagging_loss=0.00965, over 16029.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09178, pruned_loss=0.01276, audio_tagging_loss=0.008898, over 3054922.30 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:38:57,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3265760.0, ans=0.2 2023-11-26 06:39:23,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3265893.3333333335, ans=0.04949747468305833 2023-11-26 06:39:31,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3265960.0, ans=0.125 2023-11-26 06:39:35,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3265960.0, ans=0.0 2023-11-26 06:39:36,866 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-26 06:39:43,105 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8950, loss[loss=0.0627, simple_loss=0.08792, pruned_loss=0.01289, audio_tagging_loss=0.00585, over 16904.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09092, pruned_loss=0.01258, audio_tagging_loss=0.008698, over 3048799.29 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:40:08,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3266160.0, ans=0.2 2023-11-26 06:40:13,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=10.0 2023-11-26 06:40:22,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3266226.6666666665, ans=0.05 2023-11-26 06:40:32,191 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-26 06:40:33,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3266293.3333333335, ans=0.09899494936611666 2023-11-26 06:40:35,291 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.881e+01 9.559e+01 9.968e+01 1.237e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:40:38,551 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9000, loss[loss=0.06969, simple_loss=0.104, pruned_loss=0.01095, audio_tagging_loss=0.006719, over 15832.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09223, pruned_loss=0.01283, audio_tagging_loss=0.008582, over 3059195.99 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:40:38,552 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 06:41:04,799 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8452, 4.9650, 5.0927, 4.8922], device='cuda:1') 2023-11-26 06:41:05,260 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2915, 2.9723, 3.1984, 3.0136, 3.6893, 3.7776, 3.1932, 3.2072], device='cuda:1') 2023-11-26 06:41:10,846 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05057, pruned_loss=0.005166, audio_tagging_loss=0.0279, over 4681554.00 frames. 2023-11-26 06:41:10,846 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 06:41:11,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3266360.0, ans=0.0 2023-11-26 06:41:20,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3266426.6666666665, ans=0.015 2023-11-26 06:41:30,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3266426.6666666665, ans=0.125 2023-11-26 06:41:59,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3266626.6666666665, ans=0.05 2023-11-26 06:41:59,930 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-26 06:42:06,616 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9050, loss[loss=0.07356, simple_loss=0.1032, pruned_loss=0.01245, audio_tagging_loss=0.00951, over 15614.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09137, pruned_loss=0.0127, audio_tagging_loss=0.00856, over 3057245.29 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:42:19,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3266760.0, ans=0.0 2023-11-26 06:42:21,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3266760.0, ans=0.035 2023-11-26 06:42:24,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3266760.0, ans=0.125 2023-11-26 06:42:38,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3266826.6666666665, ans=0.0 2023-11-26 06:42:56,361 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-26 06:42:59,334 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.756e+01 9.461e+01 1.032e+02 1.293e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:43:03,101 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9100, loss[loss=0.06342, simple_loss=0.09288, pruned_loss=0.0104, audio_tagging_loss=0.006576, over 14052.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09206, pruned_loss=0.01275, audio_tagging_loss=0.008491, over 3053242.91 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:43:12,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3267026.6666666665, ans=0.0 2023-11-26 06:43:13,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3267093.3333333335, ans=0.125 2023-11-26 06:43:52,693 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-26 06:43:58,969 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9150, loss[loss=0.07115, simple_loss=0.08959, pruned_loss=0.01859, audio_tagging_loss=0.007769, over 15287.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09215, pruned_loss=0.01279, audio_tagging_loss=0.00851, over 3055073.89 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:44:15,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3267426.6666666665, ans=0.125 2023-11-26 06:44:24,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3267493.3333333335, ans=0.0 2023-11-26 06:44:28,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-26 06:44:34,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3267560.0, ans=0.1 2023-11-26 06:44:40,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3267560.0, ans=0.125 2023-11-26 06:44:47,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-26 06:44:50,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.875e+01 9.458e+01 1.013e+02 1.353e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:44:54,030 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9200, loss[loss=0.07631, simple_loss=0.1097, pruned_loss=0.01509, audio_tagging_loss=0.006356, over 14940.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09075, pruned_loss=0.01263, audio_tagging_loss=0.008597, over 3049530.62 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:44:54,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2023-11-26 06:45:11,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3267760.0, ans=0.125 2023-11-26 06:45:11,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-11-26 06:45:43,775 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-26 06:45:51,045 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9250, loss[loss=0.0508, simple_loss=0.06656, pruned_loss=0.009604, audio_tagging_loss=0.007916, over 14870.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09034, pruned_loss=0.01259, audio_tagging_loss=0.008621, over 3052214.80 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:45:54,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3268026.6666666665, ans=0.1 2023-11-26 06:46:15,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-26 06:46:21,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3268160.0, ans=0.09899494936611666 2023-11-26 06:46:25,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2023-11-26 06:46:29,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3268226.6666666665, ans=0.125 2023-11-26 06:46:31,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2023-11-26 06:46:39,720 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-26 06:46:43,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.603e+01 9.080e+01 9.924e+01 1.383e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 06:46:46,722 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9300, loss[loss=0.07795, simple_loss=0.1095, pruned_loss=0.0133, audio_tagging_loss=0.009882, over 15900.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08966, pruned_loss=0.01248, audio_tagging_loss=0.008714, over 3053969.76 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:46:53,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3268360.0, ans=0.125 2023-11-26 06:47:14,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-26 06:47:18,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3268560.0, ans=0.0 2023-11-26 06:47:26,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3268560.0, ans=0.2 2023-11-26 06:47:34,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3268626.6666666665, ans=10.0 2023-11-26 06:47:35,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-26 06:47:41,738 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9350, loss[loss=0.04738, simple_loss=0.06109, pruned_loss=0.006778, audio_tagging_loss=0.01005, over 16445.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08935, pruned_loss=0.01252, audio_tagging_loss=0.00878, over 3057625.52 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:47:59,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3268760.0, ans=0.125 2023-11-26 06:48:14,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2023-11-26 06:48:14,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3268893.3333333335, ans=0.1 2023-11-26 06:48:22,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3268893.3333333335, ans=0.0 2023-11-26 06:48:31,059 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-26 06:48:34,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.924e+01 9.559e+01 1.022e+02 1.389e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:48:37,929 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9400, loss[loss=0.05628, simple_loss=0.07693, pruned_loss=0.008101, audio_tagging_loss=0.009717, over 16542.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08966, pruned_loss=0.01267, audio_tagging_loss=0.008817, over 3057350.41 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:48:48,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3269093.3333333335, ans=0.125 2023-11-26 06:49:05,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-11-26 06:49:27,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-26 06:49:32,333 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:49:33,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3269360.0, ans=0.0 2023-11-26 06:49:34,420 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9450, loss[loss=0.06591, simple_loss=0.097, pruned_loss=0.0107, audio_tagging_loss=0.006714, over 15729.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09004, pruned_loss=0.01258, audio_tagging_loss=0.008925, over 3060919.93 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:49:39,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3269360.0, ans=0.125 2023-11-26 06:50:23,253 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-26 06:50:26,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.849e+01 9.435e+01 1.031e+02 1.248e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 06:50:29,513 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9500, loss[loss=0.06372, simple_loss=0.08696, pruned_loss=0.009145, audio_tagging_loss=0.0111, over 14843.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09087, pruned_loss=0.01262, audio_tagging_loss=0.008907, over 3053746.57 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:50:32,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2023-11-26 06:50:33,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3269693.3333333335, ans=0.0 2023-11-26 06:50:34,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3269693.3333333335, ans=0.125 2023-11-26 06:50:46,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3269760.0, ans=0.125 2023-11-26 06:51:18,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2023-11-26 06:51:18,670 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-26 06:51:21,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3269960.0, ans=0.05 2023-11-26 06:51:25,487 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9550, loss[loss=0.07595, simple_loss=0.1008, pruned_loss=0.0149, audio_tagging_loss=0.01065, over 14856.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.0914, pruned_loss=0.01251, audio_tagging_loss=0.008997, over 3045935.60 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:51:40,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3270093.3333333335, ans=0.1 2023-11-26 06:51:41,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3270093.3333333335, ans=0.0 2023-11-26 06:52:00,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3270226.6666666665, ans=0.0 2023-11-26 06:52:00,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3270226.6666666665, ans=0.0 2023-11-26 06:52:03,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3270226.6666666665, ans=0.07 2023-11-26 06:52:05,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3270226.6666666665, ans=0.125 2023-11-26 06:52:08,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-26 06:52:15,543 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-26 06:52:18,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.931e+01 9.591e+01 1.034e+02 1.211e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 06:52:22,419 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9600, loss[loss=0.06491, simple_loss=0.09892, pruned_loss=0.007103, audio_tagging_loss=0.008345, over 14202.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.091, pruned_loss=0.01244, audio_tagging_loss=0.009132, over 3044609.72 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:52:53,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3270493.3333333335, ans=0.0 2023-11-26 06:53:06,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3270626.6666666665, ans=0.0 2023-11-26 06:53:10,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3270626.6666666665, ans=0.0 2023-11-26 06:53:11,612 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-26 06:53:15,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3270626.6666666665, ans=15.0 2023-11-26 06:53:18,175 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9650, loss[loss=0.05964, simple_loss=0.08593, pruned_loss=0.009487, audio_tagging_loss=0.007185, over 15294.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09069, pruned_loss=0.0124, audio_tagging_loss=0.009109, over 3045138.44 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:53:33,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-26 06:54:04,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3270960.0, ans=0.1 2023-11-26 06:54:07,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-26 06:54:08,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3270960.0, ans=0.0 2023-11-26 06:54:10,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.629e+01 9.120e+01 1.007e+02 1.405e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 06:54:13,996 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9700, loss[loss=0.0827, simple_loss=0.1053, pruned_loss=0.01922, audio_tagging_loss=0.01081, over 14291.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09137, pruned_loss=0.01265, audio_tagging_loss=0.008914, over 3041655.71 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:54:36,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3271160.0, ans=0.125 2023-11-26 06:55:03,201 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-26 06:55:08,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3271293.3333333335, ans=0.04949747468305833 2023-11-26 06:55:10,695 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9750, loss[loss=0.06802, simple_loss=0.1116, pruned_loss=0.006512, audio_tagging_loss=0.005699, over 14992.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09129, pruned_loss=0.01248, audio_tagging_loss=0.008738, over 3034375.50 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:55:14,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3271360.0, ans=0.0 2023-11-26 06:55:17,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-26 06:55:22,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3271426.6666666665, ans=0.1 2023-11-26 06:55:48,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3271560.0, ans=0.125 2023-11-26 06:55:59,916 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-26 06:56:03,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.706e+01 9.282e+01 1.012e+02 1.180e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 06:56:06,098 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9800, loss[loss=0.06937, simple_loss=0.09492, pruned_loss=0.01258, audio_tagging_loss=0.00933, over 15118.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09, pruned_loss=0.01244, audio_tagging_loss=0.00867, over 3029571.09 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:56:06,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3271693.3333333335, ans=0.0 2023-11-26 06:56:09,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3271693.3333333335, ans=0.0 2023-11-26 06:56:15,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3271760.0, ans=0.125 2023-11-26 06:56:17,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-11-26 06:56:55,072 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:56:55,107 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-26 06:56:57,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3271960.0, ans=0.125 2023-11-26 06:57:01,816 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9850, loss[loss=0.06985, simple_loss=0.1018, pruned_loss=0.01193, audio_tagging_loss=0.007006, over 16659.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08963, pruned_loss=0.01229, audio_tagging_loss=0.008597, over 3033111.18 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:57:17,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3272093.3333333335, ans=0.125 2023-11-26 06:57:51,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-26 06:57:53,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3272293.3333333335, ans=0.1 2023-11-26 06:57:56,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.658e+01 9.556e+01 1.029e+02 1.537e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 06:57:58,695 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9900, loss[loss=0.06685, simple_loss=0.09242, pruned_loss=0.01338, audio_tagging_loss=0.007261, over 14365.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09012, pruned_loss=0.01238, audio_tagging_loss=0.008593, over 3033315.07 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:58:10,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-26 06:58:11,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3272426.6666666665, ans=0.0 2023-11-26 06:58:14,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-26 06:58:16,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-26 06:58:22,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-26 06:58:23,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3272493.3333333335, ans=0.125 2023-11-26 06:58:39,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3272560.0, ans=0.125 2023-11-26 06:58:45,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3272626.6666666665, ans=0.125 2023-11-26 06:58:48,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-26 06:58:55,365 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9950, loss[loss=0.06446, simple_loss=0.08357, pruned_loss=0.01197, audio_tagging_loss=0.0107, over 15582.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08974, pruned_loss=0.01224, audio_tagging_loss=0.008679, over 3038377.76 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:58:57,757 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:59:15,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3272760.0, ans=0.125 2023-11-26 06:59:22,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3272826.6666666665, ans=0.1 2023-11-26 06:59:44,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-26 06:59:48,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.546e+01 9.420e+01 1.008e+02 1.364e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 06:59:50,749 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10000, loss[loss=0.05557, simple_loss=0.08191, pruned_loss=0.007297, audio_tagging_loss=0.007322, over 15406.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0887, pruned_loss=0.01222, audio_tagging_loss=0.008773, over 3040131.40 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:00:07,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3273093.3333333335, ans=0.125 2023-11-26 07:00:17,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3273160.0, ans=0.125 2023-11-26 07:00:18,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3273160.0, ans=0.1 2023-11-26 07:00:26,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3273226.6666666665, ans=10.0 2023-11-26 07:00:40,158 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-26 07:00:47,309 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10050, loss[loss=0.05914, simple_loss=0.08082, pruned_loss=0.009138, audio_tagging_loss=0.009588, over 13659.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08882, pruned_loss=0.01233, audio_tagging_loss=0.008811, over 3032175.73 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:00:50,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-26 07:00:51,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3273360.0, ans=0.125 2023-11-26 07:01:01,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3273426.6666666665, ans=0.125 2023-11-26 07:01:05,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-26 07:01:07,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3273426.6666666665, ans=0.0 2023-11-26 07:01:09,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3273493.3333333335, ans=0.125 2023-11-26 07:01:22,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3273560.0, ans=0.0 2023-11-26 07:01:32,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3273626.6666666665, ans=0.1 2023-11-26 07:01:36,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-26 07:01:41,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.461e+01 9.073e+01 9.880e+01 1.259e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 07:01:42,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3273693.3333333335, ans=0.1 2023-11-26 07:01:43,282 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10100, loss[loss=0.0486, simple_loss=0.05507, pruned_loss=0.007944, audio_tagging_loss=0.01313, over 13919.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08869, pruned_loss=0.01217, audio_tagging_loss=0.008826, over 3033942.28 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:01:48,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3273693.3333333335, ans=0.0 2023-11-26 07:02:13,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3273826.6666666665, ans=0.2 2023-11-26 07:02:14,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3273826.6666666665, ans=0.95 2023-11-26 07:02:19,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3273893.3333333335, ans=15.0 2023-11-26 07:02:28,946 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:02:32,757 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-26 07:02:34,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3273960.0, ans=0.0 2023-11-26 07:02:39,072 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10150, loss[loss=0.0657, simple_loss=0.09006, pruned_loss=0.01193, audio_tagging_loss=0.008734, over 15586.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09016, pruned_loss=0.01245, audio_tagging_loss=0.008766, over 3043108.46 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:02:43,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3274026.6666666665, ans=0.2 2023-11-26 07:03:06,042 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:03:19,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3274226.6666666665, ans=0.125 2023-11-26 07:03:28,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-26 07:03:32,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.834e+01 9.375e+01 1.026e+02 1.327e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:03:34,514 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10200, loss[loss=0.0436, simple_loss=0.0569, pruned_loss=0.007536, audio_tagging_loss=0.007613, over 14766.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09056, pruned_loss=0.0125, audio_tagging_loss=0.00891, over 3038270.09 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:03:34,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3274360.0, ans=0.125 2023-11-26 07:03:46,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-26 07:03:55,601 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:04:19,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274626.6666666665, ans=0.1 2023-11-26 07:04:23,780 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-26 07:04:30,704 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10250, loss[loss=0.07054, simple_loss=0.09626, pruned_loss=0.01338, audio_tagging_loss=0.009029, over 15506.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08973, pruned_loss=0.01241, audio_tagging_loss=0.009008, over 3045301.07 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:04:37,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3274693.3333333335, ans=0.2 2023-11-26 07:04:55,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3274826.6666666665, ans=0.125 2023-11-26 07:04:58,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3274826.6666666665, ans=0.0 2023-11-26 07:05:19,440 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-26 07:05:23,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.938e+01 9.745e+01 1.064e+02 1.415e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-26 07:05:25,868 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10300, loss[loss=0.05771, simple_loss=0.08018, pruned_loss=0.008428, audio_tagging_loss=0.009189, over 15746.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08982, pruned_loss=0.01253, audio_tagging_loss=0.008973, over 3041829.14 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:05:28,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3275026.6666666665, ans=0.125 2023-11-26 07:05:38,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3275093.3333333335, ans=0.0 2023-11-26 07:05:47,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3275160.0, ans=0.1 2023-11-26 07:06:15,263 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-26 07:06:22,359 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10350, loss[loss=0.08092, simple_loss=0.1027, pruned_loss=0.01964, audio_tagging_loss=0.009942, over 14736.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09115, pruned_loss=0.01266, audio_tagging_loss=0.009008, over 3044392.29 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:06:35,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3275426.6666666665, ans=0.125 2023-11-26 07:06:41,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3275426.6666666665, ans=0.0 2023-11-26 07:06:52,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3275493.3333333335, ans=0.2 2023-11-26 07:06:52,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-26 07:07:10,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-26 07:07:11,735 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-26 07:07:12,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275626.6666666665, ans=0.1 2023-11-26 07:07:16,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.783e+01 9.372e+01 1.013e+02 2.774e+02, threshold=1.874e+02, percent-clipped=1.0 2023-11-26 07:07:18,536 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10400, loss[loss=0.05986, simple_loss=0.08056, pruned_loss=0.01046, audio_tagging_loss=0.009123, over 16199.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09046, pruned_loss=0.01263, audio_tagging_loss=0.009139, over 3041987.20 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:07:29,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3275760.0, ans=0.125 2023-11-26 07:07:41,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3275826.6666666665, ans=0.125 2023-11-26 07:07:47,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3275826.6666666665, ans=0.2 2023-11-26 07:07:49,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3275826.6666666665, ans=0.125 2023-11-26 07:07:52,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3275893.3333333335, ans=0.125 2023-11-26 07:07:58,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2023-11-26 07:08:07,541 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-26 07:08:14,169 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10450, loss[loss=0.06684, simple_loss=0.09291, pruned_loss=0.01257, audio_tagging_loss=0.00782, over 14766.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09062, pruned_loss=0.0127, audio_tagging_loss=0.009088, over 3040401.72 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:08:30,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3276093.3333333335, ans=0.0 2023-11-26 07:08:31,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276093.3333333335, ans=0.1 2023-11-26 07:08:33,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2023-11-26 07:08:34,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3276093.3333333335, ans=0.04949747468305833 2023-11-26 07:08:38,427 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:08:46,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3276160.0, ans=0.125 2023-11-26 07:09:03,206 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-26 07:09:07,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.707e+01 9.260e+01 9.868e+01 1.345e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:09:10,553 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10500, loss[loss=0.05954, simple_loss=0.08614, pruned_loss=0.00862, audio_tagging_loss=0.007848, over 15785.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08989, pruned_loss=0.01254, audio_tagging_loss=0.009023, over 3049048.44 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:09:14,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3276360.0, ans=0.035 2023-11-26 07:09:24,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3276426.6666666665, ans=0.0 2023-11-26 07:09:32,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3276493.3333333335, ans=0.1 2023-11-26 07:09:34,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3276493.3333333335, ans=0.125 2023-11-26 07:09:59,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-26 07:10:06,827 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10550, loss[loss=0.07159, simple_loss=0.08865, pruned_loss=0.01712, audio_tagging_loss=0.01014, over 15220.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08999, pruned_loss=0.01261, audio_tagging_loss=0.00898, over 3045779.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:10:10,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-11-26 07:10:27,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3276826.6666666665, ans=0.04949747468305833 2023-11-26 07:10:35,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3276826.6666666665, ans=0.125 2023-11-26 07:10:42,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3276893.3333333335, ans=0.0 2023-11-26 07:10:53,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2023-11-26 07:10:55,639 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-26 07:11:00,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.562e+01 9.260e+01 9.916e+01 1.260e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:11:01,906 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10600, loss[loss=0.07419, simple_loss=0.1053, pruned_loss=0.0158, audio_tagging_loss=0.005737, over 15075.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08995, pruned_loss=0.01265, audio_tagging_loss=0.008877, over 3041983.15 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:11:10,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3277026.6666666665, ans=0.125 2023-11-26 07:11:21,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=12.0 2023-11-26 07:11:34,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3277226.6666666665, ans=0.125 2023-11-26 07:11:46,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3277293.3333333335, ans=0.125 2023-11-26 07:11:50,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-26 07:11:57,754 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10650, loss[loss=0.07148, simple_loss=0.1018, pruned_loss=0.01402, audio_tagging_loss=0.006537, over 14862.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09102, pruned_loss=0.01277, audio_tagging_loss=0.008827, over 3042158.31 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:12:02,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3277360.0, ans=0.0 2023-11-26 07:12:06,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3277360.0, ans=0.125 2023-11-26 07:12:13,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3277426.6666666665, ans=0.0 2023-11-26 07:12:18,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3277426.6666666665, ans=0.125 2023-11-26 07:12:30,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3277560.0, ans=0.125 2023-11-26 07:12:35,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2023-11-26 07:12:41,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-26 07:12:42,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-26 07:12:46,774 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-26 07:12:53,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.757e+01 9.487e+01 1.015e+02 1.210e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 07:12:53,571 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10700, loss[loss=0.06657, simple_loss=0.08856, pruned_loss=0.01383, audio_tagging_loss=0.008459, over 16051.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09107, pruned_loss=0.01267, audio_tagging_loss=0.008827, over 3048690.44 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:12:53,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3277693.3333333335, ans=0.125 2023-11-26 07:13:07,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3277760.0, ans=0.125 2023-11-26 07:13:07,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3277760.0, ans=0.09899494936611666 2023-11-26 07:13:08,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3277760.0, ans=0.125 2023-11-26 07:13:16,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3277826.6666666665, ans=0.0 2023-11-26 07:13:19,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3277826.6666666665, ans=0.125 2023-11-26 07:13:42,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-26 07:13:44,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3277960.0, ans=0.125 2023-11-26 07:13:48,938 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10750, loss[loss=0.05639, simple_loss=0.06904, pruned_loss=0.01277, audio_tagging_loss=0.009103, over 15229.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09115, pruned_loss=0.01281, audio_tagging_loss=0.008799, over 3044668.19 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:13:54,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-26 07:14:01,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3278093.3333333335, ans=0.125 2023-11-26 07:14:07,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3278093.3333333335, ans=0.0 2023-11-26 07:14:26,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3278226.6666666665, ans=0.125 2023-11-26 07:14:33,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3278293.3333333335, ans=0.125 2023-11-26 07:14:33,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-11-26 07:14:35,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2023-11-26 07:14:37,765 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-26 07:14:37,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3278293.3333333335, ans=0.125 2023-11-26 07:14:38,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3278293.3333333335, ans=0.0 2023-11-26 07:14:39,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3278293.3333333335, ans=0.125 2023-11-26 07:14:44,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.438e+01 9.296e+01 1.012e+02 1.543e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 07:14:44,194 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10800, loss[loss=0.06985, simple_loss=0.08628, pruned_loss=0.01338, audio_tagging_loss=0.01333, over 15083.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09003, pruned_loss=0.01263, audio_tagging_loss=0.00886, over 3043527.44 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:14:55,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-26 07:15:00,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2023-11-26 07:15:02,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3278426.6666666665, ans=0.125 2023-11-26 07:15:16,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3278493.3333333335, ans=0.1 2023-11-26 07:15:18,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3278560.0, ans=0.125 2023-11-26 07:15:33,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-26 07:15:37,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3278626.6666666665, ans=0.0 2023-11-26 07:15:38,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3278626.6666666665, ans=0.125 2023-11-26 07:15:39,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3278693.3333333335, ans=0.2 2023-11-26 07:15:41,211 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10850, loss[loss=0.05576, simple_loss=0.06967, pruned_loss=0.01123, audio_tagging_loss=0.009698, over 15483.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09012, pruned_loss=0.0126, audio_tagging_loss=0.008784, over 3042726.80 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:15:54,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2023-11-26 07:15:58,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3278760.0, ans=0.125 2023-11-26 07:15:59,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2023-11-26 07:16:00,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3278760.0, ans=0.0 2023-11-26 07:16:07,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3278826.6666666665, ans=0.2 2023-11-26 07:16:11,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3278826.6666666665, ans=0.125 2023-11-26 07:16:24,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3278960.0, ans=0.125 2023-11-26 07:16:28,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3278960.0, ans=0.0 2023-11-26 07:16:30,341 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-26 07:16:33,486 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:16:36,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.766e+01 9.451e+01 1.013e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 07:16:36,693 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10900, loss[loss=0.05802, simple_loss=0.07071, pruned_loss=0.007876, audio_tagging_loss=0.01479, over 14824.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08962, pruned_loss=0.01238, audio_tagging_loss=0.008815, over 3039064.97 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:16:42,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3279026.6666666665, ans=0.125 2023-11-26 07:16:45,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279026.6666666665, ans=0.1 2023-11-26 07:16:56,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3279093.3333333335, ans=0.125 2023-11-26 07:17:00,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-11-26 07:17:14,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-26 07:17:16,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3279226.6666666665, ans=0.125 2023-11-26 07:17:25,306 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-26 07:17:26,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3279293.3333333335, ans=0.125 2023-11-26 07:17:28,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3279293.3333333335, ans=0.2 2023-11-26 07:17:31,494 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10950, loss[loss=0.06253, simple_loss=0.08646, pruned_loss=0.009583, audio_tagging_loss=0.009712, over 16462.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09017, pruned_loss=0.01247, audio_tagging_loss=0.008788, over 3037538.56 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:17:37,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3279360.0, ans=0.0 2023-11-26 07:17:40,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3279360.0, ans=0.125 2023-11-26 07:17:41,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3279426.6666666665, ans=0.125 2023-11-26 07:17:44,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3279426.6666666665, ans=0.05 2023-11-26 07:17:53,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3279493.3333333335, ans=0.2 2023-11-26 07:17:57,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3279493.3333333335, ans=0.0 2023-11-26 07:17:58,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3279493.3333333335, ans=0.0 2023-11-26 07:18:04,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3279560.0, ans=0.1 2023-11-26 07:18:07,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2023-11-26 07:18:20,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-26 07:18:22,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3279626.6666666665, ans=0.125 2023-11-26 07:18:27,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.778e+01 9.414e+01 1.024e+02 1.293e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 07:18:27,625 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11000, loss[loss=0.05595, simple_loss=0.07639, pruned_loss=0.009293, audio_tagging_loss=0.008466, over 14785.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08959, pruned_loss=0.01232, audio_tagging_loss=0.008932, over 3041001.99 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:18:34,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3279693.3333333335, ans=0.125 2023-11-26 07:18:36,600 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:18:45,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3279760.0, ans=0.1 2023-11-26 07:18:58,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3279826.6666666665, ans=0.125 2023-11-26 07:19:16,801 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-26 07:19:25,738 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11050, loss[loss=0.0543, simple_loss=0.06764, pruned_loss=0.01007, audio_tagging_loss=0.01042, over 14642.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09059, pruned_loss=0.01248, audio_tagging_loss=0.008864, over 3048470.93 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:19:27,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.11 vs. limit=10.0 2023-11-26 07:19:36,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2023-11-26 07:19:48,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3280160.0, ans=0.2 2023-11-26 07:20:09,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-26 07:20:14,523 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-26 07:20:19,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3280360.0, ans=0.2 2023-11-26 07:20:20,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.875e+01 9.418e+01 1.004e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 07:20:20,722 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11100, loss[loss=0.07197, simple_loss=0.08423, pruned_loss=0.01863, audio_tagging_loss=0.01123, over 13831.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08983, pruned_loss=0.01237, audio_tagging_loss=0.008931, over 3046949.12 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:20:22,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3280360.0, ans=0.125 2023-11-26 07:20:25,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3280360.0, ans=0.2 2023-11-26 07:20:31,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3280426.6666666665, ans=0.125 2023-11-26 07:20:33,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3280426.6666666665, ans=0.125 2023-11-26 07:20:38,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3280426.6666666665, ans=0.1 2023-11-26 07:20:39,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3280426.6666666665, ans=0.0 2023-11-26 07:20:45,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-26 07:20:54,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280560.0, ans=0.1 2023-11-26 07:21:04,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3280626.6666666665, ans=0.2 2023-11-26 07:21:06,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3280626.6666666665, ans=0.125 2023-11-26 07:21:09,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-26 07:21:16,282 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11150, loss[loss=0.06651, simple_loss=0.08734, pruned_loss=0.01327, audio_tagging_loss=0.009564, over 15402.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08984, pruned_loss=0.01246, audio_tagging_loss=0.008952, over 3054822.58 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:21:22,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3280693.3333333335, ans=0.0 2023-11-26 07:21:27,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3280760.0, ans=0.125 2023-11-26 07:21:34,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3280760.0, ans=0.1 2023-11-26 07:21:49,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3280893.3333333335, ans=0.09899494936611666 2023-11-26 07:22:05,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-26 07:22:08,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3280960.0, ans=0.125 2023-11-26 07:22:12,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.937e+01 9.375e+01 1.012e+02 1.316e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:22:12,788 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11200, loss[loss=0.07005, simple_loss=0.09652, pruned_loss=0.01248, audio_tagging_loss=0.009307, over 16515.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08932, pruned_loss=0.0124, audio_tagging_loss=0.009099, over 3048998.11 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:22:20,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3281026.6666666665, ans=0.2 2023-11-26 07:22:23,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-26 07:22:36,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3281160.0, ans=0.125 2023-11-26 07:22:59,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3281293.3333333335, ans=0.0 2023-11-26 07:23:01,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-26 07:23:01,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-26 07:23:08,504 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11250, loss[loss=0.05466, simple_loss=0.07703, pruned_loss=0.006909, audio_tagging_loss=0.009239, over 15896.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08913, pruned_loss=0.01254, audio_tagging_loss=0.009167, over 3054290.68 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:23:37,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3281493.3333333335, ans=0.125 2023-11-26 07:23:38,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3281493.3333333335, ans=10.0 2023-11-26 07:23:57,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-26 07:24:03,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2023-11-26 07:24:04,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.640e+01 9.467e+01 1.012e+02 1.426e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 07:24:04,059 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11300, loss[loss=0.06125, simple_loss=0.08348, pruned_loss=0.009447, audio_tagging_loss=0.01007, over 14425.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08908, pruned_loss=0.01246, audio_tagging_loss=0.009046, over 3048170.77 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:24:08,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3281693.3333333335, ans=0.0 2023-11-26 07:24:13,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3281693.3333333335, ans=0.035 2023-11-26 07:24:34,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3281826.6666666665, ans=0.2 2023-11-26 07:24:38,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3281893.3333333335, ans=0.125 2023-11-26 07:24:49,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2023-11-26 07:24:53,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-26 07:24:54,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3281960.0, ans=0.0 2023-11-26 07:25:00,212 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11350, loss[loss=0.05101, simple_loss=0.06642, pruned_loss=0.00907, audio_tagging_loss=0.008726, over 13539.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08873, pruned_loss=0.01255, audio_tagging_loss=0.008972, over 3046661.91 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:25:09,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3282093.3333333335, ans=0.1 2023-11-26 07:25:40,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3282226.6666666665, ans=0.125 2023-11-26 07:25:41,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-26 07:25:48,942 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-26 07:25:55,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.982e+01 9.660e+01 1.025e+02 3.694e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-26 07:25:55,319 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11400, loss[loss=0.06704, simple_loss=0.09199, pruned_loss=0.0142, audio_tagging_loss=0.006849, over 16084.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08894, pruned_loss=0.01236, audio_tagging_loss=0.008924, over 3048870.29 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:26:00,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3282360.0, ans=0.0 2023-11-26 07:26:08,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3282426.6666666665, ans=0.0 2023-11-26 07:26:13,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2023-11-26 07:26:16,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-11-26 07:26:20,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3282493.3333333335, ans=0.0 2023-11-26 07:26:26,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3282493.3333333335, ans=0.125 2023-11-26 07:26:39,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-11-26 07:26:40,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3282626.6666666665, ans=0.125 2023-11-26 07:26:44,822 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-26 07:26:51,867 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11450, loss[loss=0.08268, simple_loss=0.1072, pruned_loss=0.01966, audio_tagging_loss=0.00942, over 15796.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08878, pruned_loss=0.01232, audio_tagging_loss=0.008937, over 3042990.82 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:27:22,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282826.6666666665, ans=0.1 2023-11-26 07:27:32,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3282893.3333333335, ans=0.125 2023-11-26 07:27:33,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3282893.3333333335, ans=0.0 2023-11-26 07:27:34,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2023-11-26 07:27:40,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-26 07:27:47,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.866e+01 9.675e+01 1.039e+02 1.240e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 07:27:47,937 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11500, loss[loss=0.0579, simple_loss=0.07235, pruned_loss=0.01213, audio_tagging_loss=0.009601, over 16040.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08937, pruned_loss=0.01242, audio_tagging_loss=0.008853, over 3051046.93 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:28:02,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3283093.3333333335, ans=0.0 2023-11-26 07:28:15,796 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:28:17,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-26 07:28:18,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3283160.0, ans=0.0 2023-11-26 07:28:36,730 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-26 07:28:42,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3283360.0, ans=0.125 2023-11-26 07:28:42,923 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11550, loss[loss=0.06651, simple_loss=0.09429, pruned_loss=0.01138, audio_tagging_loss=0.007978, over 15247.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09018, pruned_loss=0.01255, audio_tagging_loss=0.008747, over 3054414.52 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:28:45,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3283360.0, ans=0.125 2023-11-26 07:29:12,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3283493.3333333335, ans=0.125 2023-11-26 07:29:16,759 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:29:21,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2023-11-26 07:29:31,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-26 07:29:33,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3283626.6666666665, ans=0.09899494936611666 2023-11-26 07:29:38,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.923e+01 9.634e+01 1.014e+02 1.304e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 07:29:38,948 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11600, loss[loss=0.05786, simple_loss=0.06837, pruned_loss=0.01246, audio_tagging_loss=0.01121, over 15947.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09009, pruned_loss=0.01254, audio_tagging_loss=0.008714, over 3062154.39 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:29:46,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3283693.3333333335, ans=0.125 2023-11-26 07:29:51,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3283760.0, ans=0.0 2023-11-26 07:29:55,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3283760.0, ans=0.0 2023-11-26 07:30:16,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3283893.3333333335, ans=0.125 2023-11-26 07:30:27,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-26 07:30:34,802 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11650, loss[loss=0.06217, simple_loss=0.07936, pruned_loss=0.01276, audio_tagging_loss=0.009725, over 16334.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08985, pruned_loss=0.01245, audio_tagging_loss=0.008778, over 3060390.73 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:30:48,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3284093.3333333335, ans=0.125 2023-11-26 07:31:09,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3284226.6666666665, ans=0.2 2023-11-26 07:31:23,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-26 07:31:23,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3284293.3333333335, ans=0.0 2023-11-26 07:31:29,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.493e+01 9.108e+01 9.754e+01 1.305e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 07:31:29,961 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11700, loss[loss=0.05501, simple_loss=0.07315, pruned_loss=0.008428, audio_tagging_loss=0.01, over 14885.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08903, pruned_loss=0.01233, audio_tagging_loss=0.008783, over 3049731.44 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:31:36,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-11-26 07:32:01,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3284493.3333333335, ans=0.125 2023-11-26 07:32:17,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-26 07:32:18,471 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-26 07:32:24,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-26 07:32:24,688 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11750, loss[loss=0.06782, simple_loss=0.09506, pruned_loss=0.01365, audio_tagging_loss=0.006634, over 15251.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08996, pruned_loss=0.01246, audio_tagging_loss=0.008813, over 3047690.55 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:32:36,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3284760.0, ans=0.125 2023-11-26 07:33:14,319 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-26 07:33:21,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.644e+01 9.343e+01 1.016e+02 1.345e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 07:33:21,118 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11800, loss[loss=0.07787, simple_loss=0.1094, pruned_loss=0.01532, audio_tagging_loss=0.007873, over 14810.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08988, pruned_loss=0.01246, audio_tagging_loss=0.008855, over 3043148.05 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:33:28,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3285026.6666666665, ans=0.0 2023-11-26 07:33:29,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3285026.6666666665, ans=0.125 2023-11-26 07:33:37,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3285093.3333333335, ans=0.1 2023-11-26 07:33:38,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3285093.3333333335, ans=0.125 2023-11-26 07:34:04,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3285293.3333333335, ans=0.125 2023-11-26 07:34:10,108 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-26 07:34:16,632 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11850, loss[loss=0.05247, simple_loss=0.07154, pruned_loss=0.00746, audio_tagging_loss=0.009238, over 15220.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08937, pruned_loss=0.01238, audio_tagging_loss=0.008907, over 3047021.61 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:34:19,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3285360.0, ans=0.0 2023-11-26 07:34:31,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3285426.6666666665, ans=0.0 2023-11-26 07:34:45,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3285493.3333333335, ans=0.125 2023-11-26 07:35:05,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-26 07:35:08,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-11-26 07:35:11,679 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11900, loss[loss=0.07408, simple_loss=0.1071, pruned_loss=0.01131, audio_tagging_loss=0.009252, over 15395.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08956, pruned_loss=0.01228, audio_tagging_loss=0.008978, over 3045068.95 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:35:12,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.644e+01 9.176e+01 9.875e+01 1.365e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 07:35:36,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3285826.6666666665, ans=0.125 2023-11-26 07:35:44,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-26 07:36:00,475 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-26 07:36:05,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3286026.6666666665, ans=0.125 2023-11-26 07:36:07,290 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11950, loss[loss=0.07362, simple_loss=0.108, pruned_loss=0.01358, audio_tagging_loss=0.006036, over 16117.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08975, pruned_loss=0.01231, audio_tagging_loss=0.009083, over 3054729.52 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:36:11,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3286026.6666666665, ans=0.125 2023-11-26 07:36:13,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3286026.6666666665, ans=0.125 2023-11-26 07:36:43,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3286226.6666666665, ans=0.0 2023-11-26 07:36:54,472 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-26 07:36:54,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3286293.3333333335, ans=0.0 2023-11-26 07:36:57,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3286293.3333333335, ans=0.0 2023-11-26 07:37:00,470 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 12000, loss[loss=0.0659, simple_loss=0.0836, pruned_loss=0.01281, audio_tagging_loss=0.01128, over 15500.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09001, pruned_loss=0.01233, audio_tagging_loss=0.009132, over 3054818.94 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:37:00,471 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 07:37:33,025 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05803, simple_loss=0.05068, pruned_loss=0.005323, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 07:37:33,026 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 07:37:35,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.785e+01 9.392e+01 1.025e+02 1.388e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 07:38:28,140 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 0, loss[loss=0.06709, simple_loss=0.0683, pruned_loss=0.007928, audio_tagging_loss=0.02502, over 15604.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.0683, pruned_loss=0.007928, audio_tagging_loss=0.02502, over 15604.00 frames. ], batch size: 64, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:38:28,141 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 07:38:41,097 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4172, 3.2499, 3.8499, 3.5286], device='cuda:1') 2023-11-26 07:38:59,431 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05791, simple_loss=0.05064, pruned_loss=0.005256, audio_tagging_loss=0.02733, over 4681554.00 frames. 2023-11-26 07:38:59,432 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 07:39:00,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3286513.3333333335, ans=0.125 2023-11-26 07:39:01,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3286513.3333333335, ans=0.1 2023-11-26 07:39:01,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3286513.3333333335, ans=0.0 2023-11-26 07:39:23,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-26 07:39:28,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.94 vs. limit=15.0 2023-11-26 07:39:40,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3286713.3333333335, ans=0.0 2023-11-26 07:39:55,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3286846.6666666665, ans=0.125 2023-11-26 07:39:56,097 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 50, loss[loss=0.07783, simple_loss=0.1004, pruned_loss=0.01404, audio_tagging_loss=0.01362, over 15639.00 frames. ], tot_loss[loss=0.07401, simple_loss=0.09068, pruned_loss=0.01235, audio_tagging_loss=0.01632, over 694660.46 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:40:18,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3286980.0, ans=0.125 2023-11-26 07:40:19,861 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-26 07:40:23,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3286980.0, ans=0.125 2023-11-26 07:40:29,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.630e+01 1.022e+02 1.088e+02 1.448e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-26 07:40:31,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-26 07:40:42,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2023-11-26 07:40:52,962 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 100, loss[loss=0.07399, simple_loss=0.09664, pruned_loss=0.01121, audio_tagging_loss=0.01447, over 15578.00 frames. ], tot_loss[loss=0.07466, simple_loss=0.09201, pruned_loss=0.0127, audio_tagging_loss=0.01595, over 1212073.94 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:10,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3287246.6666666665, ans=0.1 2023-11-26 07:41:16,409 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-26 07:41:34,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3287380.0, ans=0.0 2023-11-26 07:41:48,980 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 150, loss[loss=0.0766, simple_loss=0.09734, pruned_loss=0.01425, audio_tagging_loss=0.01368, over 15829.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.09148, pruned_loss=0.01227, audio_tagging_loss=0.01439, over 1618034.02 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:55,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3287513.3333333335, ans=0.0 2023-11-26 07:42:02,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3287580.0, ans=0.125 2023-11-26 07:42:12,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3287646.6666666665, ans=0.125 2023-11-26 07:42:13,184 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-26 07:42:20,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3287646.6666666665, ans=0.0 2023-11-26 07:42:21,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 9.081e+01 9.641e+01 1.033e+02 1.343e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 07:42:41,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3287780.0, ans=0.0 2023-11-26 07:42:44,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3287846.6666666665, ans=0.0 2023-11-26 07:42:44,916 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 200, loss[loss=0.07466, simple_loss=0.09585, pruned_loss=0.01904, audio_tagging_loss=0.007696, over 15143.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.08876, pruned_loss=0.01202, audio_tagging_loss=0.01299, over 1936026.29 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:43:08,501 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-26 07:43:18,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3288046.6666666665, ans=0.0 2023-11-26 07:43:27,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3288046.6666666665, ans=0.2 2023-11-26 07:43:32,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3288113.3333333335, ans=0.2 2023-11-26 07:43:41,878 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 250, loss[loss=0.07394, simple_loss=0.09844, pruned_loss=0.01584, audio_tagging_loss=0.008875, over 15631.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.08961, pruned_loss=0.01238, audio_tagging_loss=0.01194, over 2180273.29 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:43:42,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3288180.0, ans=0.07 2023-11-26 07:43:42,136 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:43:43,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3288180.0, ans=0.125 2023-11-26 07:43:45,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3288180.0, ans=0.0 2023-11-26 07:43:51,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3288246.6666666665, ans=0.125 2023-11-26 07:44:04,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3288313.3333333335, ans=0.0 2023-11-26 07:44:05,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-26 07:44:14,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.741e+01 9.429e+01 1.027e+02 1.277e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:44:24,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3288380.0, ans=0.2 2023-11-26 07:44:33,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3288446.6666666665, ans=0.125 2023-11-26 07:44:34,876 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:44:37,412 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 300, loss[loss=0.04806, simple_loss=0.06845, pruned_loss=0.004127, audio_tagging_loss=0.009711, over 14508.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.09015, pruned_loss=0.01254, audio_tagging_loss=0.01097, over 2373457.15 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:44:39,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=12.0 2023-11-26 07:45:00,696 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-26 07:45:16,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3288713.3333333335, ans=0.0 2023-11-26 07:45:28,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3288780.0, ans=0.0 2023-11-26 07:45:33,047 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 350, loss[loss=0.06017, simple_loss=0.07944, pruned_loss=0.008959, audio_tagging_loss=0.01148, over 15244.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.08942, pruned_loss=0.01242, audio_tagging_loss=0.01051, over 2523177.51 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:45:33,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3288846.6666666665, ans=0.2 2023-11-26 07:45:56,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-26 07:45:56,623 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-26 07:46:06,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.599e+01 9.325e+01 1.001e+02 1.376e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:46:20,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-26 07:46:29,001 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 400, loss[loss=0.06794, simple_loss=0.09858, pruned_loss=0.01087, audio_tagging_loss=0.007783, over 15942.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.08946, pruned_loss=0.0124, audio_tagging_loss=0.01018, over 2640212.10 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:46:39,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-26 07:46:52,945 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-26 07:46:56,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3289313.3333333335, ans=0.125 2023-11-26 07:47:03,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3289380.0, ans=0.1 2023-11-26 07:47:10,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3289380.0, ans=0.0 2023-11-26 07:47:15,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-26 07:47:17,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-26 07:47:18,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-26 07:47:24,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-26 07:47:25,003 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 450, loss[loss=0.07867, simple_loss=0.102, pruned_loss=0.01914, audio_tagging_loss=0.008519, over 15528.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.08934, pruned_loss=0.0124, audio_tagging_loss=0.009898, over 2732572.78 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:47:32,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3289513.3333333335, ans=0.02 2023-11-26 07:47:44,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289580.0, ans=0.1 2023-11-26 07:47:48,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-26 07:47:52,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2023-11-26 07:47:59,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.870e+01 9.366e+01 1.009e+02 1.216e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 07:48:01,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3289713.3333333335, ans=0.2 2023-11-26 07:48:02,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3289713.3333333335, ans=0.07 2023-11-26 07:48:05,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3289713.3333333335, ans=0.0 2023-11-26 07:48:15,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289780.0, ans=0.1 2023-11-26 07:48:18,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-26 07:48:21,399 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 500, loss[loss=0.09697, simple_loss=0.1387, pruned_loss=0.02158, audio_tagging_loss=0.006051, over 15199.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09018, pruned_loss=0.0125, audio_tagging_loss=0.009606, over 2794340.24 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:48:32,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-26 07:48:35,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3289913.3333333335, ans=0.0 2023-11-26 07:48:36,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3289913.3333333335, ans=0.07 2023-11-26 07:48:39,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-26 07:48:40,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-26 07:48:44,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-26 07:48:57,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3290046.6666666665, ans=0.0 2023-11-26 07:49:17,266 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 550, loss[loss=0.05679, simple_loss=0.07695, pruned_loss=0.00856, audio_tagging_loss=0.009752, over 15090.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08914, pruned_loss=0.01236, audio_tagging_loss=0.009461, over 2844648.31 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:49:17,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3290180.0, ans=0.125 2023-11-26 07:49:33,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.54 vs. limit=10.0 2023-11-26 07:49:35,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3290246.6666666665, ans=0.125 2023-11-26 07:49:40,768 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-26 07:49:51,813 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.897e+01 9.489e+01 1.022e+02 1.296e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 07:49:54,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3290380.0, ans=0.125 2023-11-26 07:50:07,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2023-11-26 07:50:13,150 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 600, loss[loss=0.05231, simple_loss=0.06262, pruned_loss=0.01214, audio_tagging_loss=0.008855, over 14110.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.08992, pruned_loss=0.01257, audio_tagging_loss=0.009319, over 2888747.36 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:50:24,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3290580.0, ans=0.125 2023-11-26 07:50:24,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3290580.0, ans=0.2 2023-11-26 07:50:26,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3290580.0, ans=0.125 2023-11-26 07:50:34,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3290646.6666666665, ans=0.07 2023-11-26 07:50:36,757 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-26 07:50:43,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3290646.6666666665, ans=0.1 2023-11-26 07:50:58,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-26 07:51:09,758 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 650, loss[loss=0.07374, simple_loss=0.102, pruned_loss=0.01582, audio_tagging_loss=0.006944, over 14985.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.0897, pruned_loss=0.01248, audio_tagging_loss=0.009323, over 2923146.63 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:51:15,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2023-11-26 07:51:22,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3290913.3333333335, ans=0.125 2023-11-26 07:51:32,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-26 07:51:32,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3290980.0, ans=0.5 2023-11-26 07:51:36,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-26 07:51:42,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3291046.6666666665, ans=0.95 2023-11-26 07:51:44,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.602e+01 9.245e+01 1.014e+02 1.320e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 07:52:05,558 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 700, loss[loss=0.08708, simple_loss=0.1192, pruned_loss=0.02128, audio_tagging_loss=0.006206, over 14285.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08994, pruned_loss=0.01234, audio_tagging_loss=0.009165, over 2950724.74 frames. ], batch size: 51, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:52:07,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-11-26 07:52:29,162 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-26 07:52:37,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3291313.3333333335, ans=0.1 2023-11-26 07:52:44,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 07:52:56,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3291446.6666666665, ans=0.0 2023-11-26 07:52:57,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.24 vs. limit=22.5 2023-11-26 07:52:59,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291446.6666666665, ans=0.1 2023-11-26 07:53:01,093 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 750, loss[loss=0.0629, simple_loss=0.09161, pruned_loss=0.00995, audio_tagging_loss=0.007144, over 16746.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09132, pruned_loss=0.01254, audio_tagging_loss=0.009051, over 2977701.19 frames. ], batch size: 64, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:53:04,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3291513.3333333335, ans=0.02 2023-11-26 07:53:07,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=22.5 2023-11-26 07:53:12,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3291580.0, ans=0.1 2023-11-26 07:53:23,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3291646.6666666665, ans=0.125 2023-11-26 07:53:25,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-26 07:53:36,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.579e+01 9.292e+01 9.836e+01 1.327e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 07:53:38,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3291713.3333333335, ans=0.0 2023-11-26 07:53:42,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3291713.3333333335, ans=0.125 2023-11-26 07:53:47,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3291780.0, ans=0.015 2023-11-26 07:53:49,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3291780.0, ans=0.125 2023-11-26 07:53:58,246 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 800, loss[loss=0.0822, simple_loss=0.1171, pruned_loss=0.01653, audio_tagging_loss=0.007115, over 15210.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09161, pruned_loss=0.01263, audio_tagging_loss=0.009075, over 2990952.08 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:54:09,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3291913.3333333335, ans=0.2 2023-11-26 07:54:11,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3291913.3333333335, ans=0.1 2023-11-26 07:54:21,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-26 07:54:27,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2023-11-26 07:54:32,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3292046.6666666665, ans=0.0 2023-11-26 07:54:53,896 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 850, loss[loss=0.05498, simple_loss=0.06468, pruned_loss=0.01345, audio_tagging_loss=0.009192, over 14634.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09131, pruned_loss=0.01277, audio_tagging_loss=0.009153, over 2999933.25 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:55:11,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3292246.6666666665, ans=0.1 2023-11-26 07:55:12,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3292246.6666666665, ans=0.2 2023-11-26 07:55:12,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3292246.6666666665, ans=0.07 2023-11-26 07:55:16,671 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-26 07:55:17,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-26 07:55:17,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3292313.3333333335, ans=0.2 2023-11-26 07:55:17,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3292313.3333333335, ans=0.2 2023-11-26 07:55:29,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.678e+01 9.372e+01 1.019e+02 1.445e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 07:55:36,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3292380.0, ans=0.125 2023-11-26 07:55:48,980 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 900, loss[loss=0.06217, simple_loss=0.07954, pruned_loss=0.01103, audio_tagging_loss=0.01137, over 15394.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09089, pruned_loss=0.01256, audio_tagging_loss=0.009236, over 3015894.48 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:55:49,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-26 07:55:59,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3292580.0, ans=0.125 2023-11-26 07:56:13,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-26 07:56:13,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3292646.6666666665, ans=0.0 2023-11-26 07:56:16,513 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:56:36,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3292780.0, ans=0.2 2023-11-26 07:56:45,002 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 950, loss[loss=0.05356, simple_loss=0.07787, pruned_loss=0.006255, audio_tagging_loss=0.008373, over 15673.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09023, pruned_loss=0.01237, audio_tagging_loss=0.009172, over 3027107.30 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:56:48,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2023-11-26 07:56:56,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3292913.3333333335, ans=0.1 2023-11-26 07:57:09,223 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-26 07:57:09,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2023-11-26 07:57:15,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-26 07:57:16,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3292980.0, ans=0.1 2023-11-26 07:57:20,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3293046.6666666665, ans=0.125 2023-11-26 07:57:20,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.738e+01 9.325e+01 9.888e+01 1.254e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:57:41,275 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1000, loss[loss=0.05893, simple_loss=0.08035, pruned_loss=0.00938, audio_tagging_loss=0.009374, over 14661.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09107, pruned_loss=0.01256, audio_tagging_loss=0.008942, over 3030602.53 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:57:41,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3293180.0, ans=0.125 2023-11-26 07:57:45,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3293180.0, ans=0.125 2023-11-26 07:57:47,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3293180.0, ans=0.0 2023-11-26 07:57:49,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3293180.0, ans=0.125 2023-11-26 07:57:57,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2023-11-26 07:57:59,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.02 vs. limit=10.0 2023-11-26 07:58:04,309 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:58:04,341 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-26 07:58:16,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3293380.0, ans=0.125 2023-11-26 07:58:30,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3293446.6666666665, ans=0.0 2023-11-26 07:58:30,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3293446.6666666665, ans=0.025 2023-11-26 07:58:36,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-26 07:58:37,593 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1050, loss[loss=0.06216, simple_loss=0.08533, pruned_loss=0.01059, audio_tagging_loss=0.008913, over 15547.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09075, pruned_loss=0.01256, audio_tagging_loss=0.00891, over 3040973.46 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:58:44,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293513.3333333335, ans=0.1 2023-11-26 07:58:45,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3293513.3333333335, ans=0.125 2023-11-26 07:58:50,229 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:58:51,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3293580.0, ans=0.1 2023-11-26 07:58:56,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3293580.0, ans=0.1 2023-11-26 07:58:57,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-26 07:59:01,674 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-26 07:59:06,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3293646.6666666665, ans=0.125 2023-11-26 07:59:13,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.710e+01 9.431e+01 1.020e+02 1.408e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:59:15,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2023-11-26 07:59:17,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3293713.3333333335, ans=0.125 2023-11-26 07:59:33,783 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1100, loss[loss=0.06839, simple_loss=0.07547, pruned_loss=0.01877, audio_tagging_loss=0.01189, over 16325.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09023, pruned_loss=0.01259, audio_tagging_loss=0.008831, over 3038197.30 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:59:36,070 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:59:41,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3293846.6666666665, ans=0.125 2023-11-26 07:59:46,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-26 07:59:58,042 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-26 08:00:01,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3293980.0, ans=0.125 2023-11-26 08:00:11,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-26 08:00:24,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3294113.3333333335, ans=0.125 2023-11-26 08:00:30,399 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1150, loss[loss=0.07343, simple_loss=0.1004, pruned_loss=0.01413, audio_tagging_loss=0.009095, over 16694.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09005, pruned_loss=0.01263, audio_tagging_loss=0.008796, over 3041385.65 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:00:49,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3294246.6666666665, ans=0.125 2023-11-26 08:00:53,167 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-26 08:01:05,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.571e+01 9.145e+01 9.893e+01 1.532e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 08:01:14,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294446.6666666665, ans=0.1 2023-11-26 08:01:26,213 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1200, loss[loss=0.07983, simple_loss=0.1163, pruned_loss=0.01736, audio_tagging_loss=0.004296, over 15634.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09106, pruned_loss=0.01273, audio_tagging_loss=0.008678, over 3042147.88 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:01:32,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3294513.3333333335, ans=0.2 2023-11-26 08:01:41,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3294580.0, ans=0.0 2023-11-26 08:01:49,750 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-26 08:01:56,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3294646.6666666665, ans=0.1 2023-11-26 08:02:10,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3294780.0, ans=0.0 2023-11-26 08:02:22,009 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1250, loss[loss=0.07037, simple_loss=0.09854, pruned_loss=0.0129, audio_tagging_loss=0.008196, over 14732.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09065, pruned_loss=0.01261, audio_tagging_loss=0.008565, over 3043390.19 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:02:24,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-11-26 08:02:33,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3294913.3333333335, ans=0.0 2023-11-26 08:02:41,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3294913.3333333335, ans=0.0 2023-11-26 08:02:46,662 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-26 08:02:50,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3294980.0, ans=0.0 2023-11-26 08:02:58,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.635e+01 9.244e+01 9.927e+01 1.336e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 08:03:11,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3295113.3333333335, ans=0.125 2023-11-26 08:03:18,631 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1300, loss[loss=0.0729, simple_loss=0.107, pruned_loss=0.0103, audio_tagging_loss=0.00909, over 15137.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09083, pruned_loss=0.01257, audio_tagging_loss=0.008582, over 3049142.69 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:03:18,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3295180.0, ans=0.0 2023-11-26 08:03:19,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3295180.0, ans=0.125 2023-11-26 08:03:32,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=22.5 2023-11-26 08:03:42,044 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-26 08:04:01,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2023-11-26 08:04:14,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3295513.3333333335, ans=0.125 2023-11-26 08:04:14,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3295513.3333333335, ans=0.125 2023-11-26 08:04:14,939 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1350, loss[loss=0.06169, simple_loss=0.08544, pruned_loss=0.01074, audio_tagging_loss=0.00823, over 15558.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09103, pruned_loss=0.01253, audio_tagging_loss=0.008587, over 3049572.72 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:04:18,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3295513.3333333335, ans=0.125 2023-11-26 08:04:19,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3295513.3333333335, ans=0.0 2023-11-26 08:04:26,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295580.0, ans=0.1 2023-11-26 08:04:30,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=22.5 2023-11-26 08:04:38,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-26 08:04:49,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3295713.3333333335, ans=0.125 2023-11-26 08:04:49,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3295713.3333333335, ans=0.125 2023-11-26 08:04:49,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3295713.3333333335, ans=0.125 2023-11-26 08:04:52,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.816e+01 9.406e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:04:53,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-26 08:04:55,783 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:05:04,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3295780.0, ans=0.2 2023-11-26 08:05:10,840 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1400, loss[loss=0.05735, simple_loss=0.0748, pruned_loss=0.01086, audio_tagging_loss=0.009084, over 14490.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09105, pruned_loss=0.01254, audio_tagging_loss=0.008645, over 3055864.68 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:05:27,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3295913.3333333335, ans=0.05 2023-11-26 08:05:31,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-26 08:05:34,860 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-26 08:05:34,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3295980.0, ans=0.125 2023-11-26 08:05:39,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3295980.0, ans=0.1 2023-11-26 08:05:42,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2023-11-26 08:05:44,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-26 08:06:07,644 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1450, loss[loss=0.0591, simple_loss=0.08319, pruned_loss=0.009328, audio_tagging_loss=0.008178, over 15909.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09101, pruned_loss=0.01251, audio_tagging_loss=0.008719, over 3063448.13 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:06:16,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3296180.0, ans=0.125 2023-11-26 08:06:31,014 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-26 08:06:44,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.868e+01 9.341e+01 9.992e+01 1.188e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 08:07:04,080 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1500, loss[loss=0.05825, simple_loss=0.07911, pruned_loss=0.01002, audio_tagging_loss=0.008681, over 14926.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09096, pruned_loss=0.01255, audio_tagging_loss=0.008822, over 3058837.27 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:07:11,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3296513.3333333335, ans=0.0 2023-11-26 08:07:16,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2023-11-26 08:07:27,049 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-26 08:07:27,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3296646.6666666665, ans=0.125 2023-11-26 08:07:52,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-26 08:07:54,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3296780.0, ans=0.125 2023-11-26 08:07:56,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3296780.0, ans=0.0 2023-11-26 08:07:59,527 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1550, loss[loss=0.06417, simple_loss=0.08423, pruned_loss=0.01189, audio_tagging_loss=0.01017, over 15052.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.091, pruned_loss=0.01255, audio_tagging_loss=0.008935, over 3052630.51 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:08:16,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3296913.3333333335, ans=0.125 2023-11-26 08:08:22,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-26 08:08:22,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3296980.0, ans=0.125 2023-11-26 08:08:26,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2023-11-26 08:08:28,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3296980.0, ans=0.0 2023-11-26 08:08:36,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.636e+01 9.426e+01 1.014e+02 1.319e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:08:54,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2023-11-26 08:08:55,625 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1600, loss[loss=0.0769, simple_loss=0.1004, pruned_loss=0.01928, audio_tagging_loss=0.007437, over 15676.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09109, pruned_loss=0.01249, audio_tagging_loss=0.008979, over 3057533.35 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:09:00,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3297180.0, ans=0.125 2023-11-26 08:09:01,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3297180.0, ans=0.125 2023-11-26 08:09:18,993 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-26 08:09:19,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=22.5 2023-11-26 08:09:25,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3297313.3333333335, ans=0.0 2023-11-26 08:09:36,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3297380.0, ans=0.0 2023-11-26 08:09:51,645 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1650, loss[loss=0.07321, simple_loss=0.09732, pruned_loss=0.01774, audio_tagging_loss=0.006809, over 14560.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09041, pruned_loss=0.0124, audio_tagging_loss=0.009076, over 3048070.39 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:10:00,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3297513.3333333335, ans=0.125 2023-11-26 08:10:15,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-26 08:10:28,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.801e+01 9.353e+01 1.001e+02 1.567e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:10:47,520 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1700, loss[loss=0.07579, simple_loss=0.1062, pruned_loss=0.01126, audio_tagging_loss=0.01144, over 15120.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09017, pruned_loss=0.01233, audio_tagging_loss=0.009077, over 3042060.16 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:10:54,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3297846.6666666665, ans=0.125 2023-11-26 08:11:10,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3297980.0, ans=0.1 2023-11-26 08:11:11,007 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-26 08:11:20,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3298046.6666666665, ans=0.125 2023-11-26 08:11:31,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2023-11-26 08:11:33,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3298113.3333333335, ans=0.1 2023-11-26 08:11:43,207 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1750, loss[loss=0.0691, simple_loss=0.1003, pruned_loss=0.01092, audio_tagging_loss=0.008051, over 15326.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08916, pruned_loss=0.01233, audio_tagging_loss=0.008979, over 3041141.85 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:11:55,514 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:12:03,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3298246.6666666665, ans=0.2 2023-11-26 08:12:06,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-26 08:12:21,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.900e+01 9.496e+01 1.021e+02 1.422e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 08:12:30,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3298446.6666666665, ans=0.125 2023-11-26 08:12:34,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3298446.6666666665, ans=0.025 2023-11-26 08:12:36,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3298446.6666666665, ans=0.0 2023-11-26 08:12:39,607 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1800, loss[loss=0.09016, simple_loss=0.1253, pruned_loss=0.01959, audio_tagging_loss=0.007898, over 15804.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08825, pruned_loss=0.01219, audio_tagging_loss=0.008905, over 3034456.90 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:12:45,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3298513.3333333335, ans=0.1 2023-11-26 08:12:45,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3298513.3333333335, ans=0.0 2023-11-26 08:12:51,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3298580.0, ans=0.125 2023-11-26 08:12:57,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3298580.0, ans=0.125 2023-11-26 08:13:00,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3298646.6666666665, ans=0.125 2023-11-26 08:13:01,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3298646.6666666665, ans=0.125 2023-11-26 08:13:03,003 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-26 08:13:19,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2023-11-26 08:13:35,228 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1850, loss[loss=0.08368, simple_loss=0.1147, pruned_loss=0.01589, audio_tagging_loss=0.01046, over 15085.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0888, pruned_loss=0.01226, audio_tagging_loss=0.008846, over 3037001.84 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:13:46,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3298913.3333333335, ans=0.0 2023-11-26 08:13:51,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3298913.3333333335, ans=0.2 2023-11-26 08:13:52,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3298913.3333333335, ans=0.125 2023-11-26 08:13:55,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3298913.3333333335, ans=0.125 2023-11-26 08:13:59,424 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-26 08:14:09,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3299046.6666666665, ans=10.0 2023-11-26 08:14:13,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-11-26 08:14:14,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.832e+01 9.434e+01 1.017e+02 1.223e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:14:15,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-26 08:14:27,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3299113.3333333335, ans=0.0 2023-11-26 08:14:28,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3299113.3333333335, ans=0.0 2023-11-26 08:14:31,939 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1900, loss[loss=0.06014, simple_loss=0.07588, pruned_loss=0.01211, audio_tagging_loss=0.0101, over 15874.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08833, pruned_loss=0.01223, audio_tagging_loss=0.008819, over 3042255.24 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:14:42,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3299246.6666666665, ans=0.0 2023-11-26 08:14:46,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-26 08:14:55,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-26 08:15:00,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-26 08:15:19,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3299446.6666666665, ans=0.0 2023-11-26 08:15:24,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2023-11-26 08:15:27,389 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1950, loss[loss=0.06735, simple_loss=0.09249, pruned_loss=0.01067, audio_tagging_loss=0.01043, over 15027.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08848, pruned_loss=0.01219, audio_tagging_loss=0.008785, over 3045691.08 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:15:51,045 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-26 08:16:06,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.592e+01 9.452e+01 9.958e+01 1.219e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 08:16:13,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-26 08:16:19,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3299780.0, ans=0.0 2023-11-26 08:16:20,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3299780.0, ans=0.2 2023-11-26 08:16:23,461 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2000, loss[loss=0.05601, simple_loss=0.07081, pruned_loss=0.01018, audio_tagging_loss=0.01043, over 15085.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08849, pruned_loss=0.01216, audio_tagging_loss=0.008742, over 3052499.03 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:16:25,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-11-26 08:16:27,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3299846.6666666665, ans=0.125 2023-11-26 08:16:46,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3299980.0, ans=0.5 2023-11-26 08:16:47,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-26 08:16:48,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3299980.0, ans=0.2 2023-11-26 08:17:05,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3300046.6666666665, ans=0.0 2023-11-26 08:17:07,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2023-11-26 08:17:19,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3300180.0, ans=0.1 2023-11-26 08:17:19,880 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2050, loss[loss=0.05485, simple_loss=0.07368, pruned_loss=0.008043, audio_tagging_loss=0.00996, over 15116.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.0886, pruned_loss=0.01218, audio_tagging_loss=0.008665, over 3046385.95 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:17:21,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3300180.0, ans=0.125 2023-11-26 08:17:43,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-26 08:17:52,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-26 08:17:58,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.680e+01 9.276e+01 1.017e+02 1.208e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 08:18:00,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3300380.0, ans=0.125 2023-11-26 08:18:07,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3300446.6666666665, ans=0.0 2023-11-26 08:18:08,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3300446.6666666665, ans=0.125 2023-11-26 08:18:16,051 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2100, loss[loss=0.06624, simple_loss=0.09409, pruned_loss=0.009554, audio_tagging_loss=0.009638, over 14952.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08897, pruned_loss=0.01217, audio_tagging_loss=0.008712, over 3043276.92 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:18:31,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3300580.0, ans=0.2 2023-11-26 08:18:39,018 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-26 08:18:46,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3300646.6666666665, ans=0.2 2023-11-26 08:18:51,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3300713.3333333335, ans=0.0 2023-11-26 08:19:11,876 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2150, loss[loss=0.05997, simple_loss=0.0814, pruned_loss=0.009861, audio_tagging_loss=0.009406, over 15932.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0891, pruned_loss=0.01208, audio_tagging_loss=0.008715, over 3042048.11 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:19:19,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3300846.6666666665, ans=0.2 2023-11-26 08:19:20,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3300846.6666666665, ans=0.125 2023-11-26 08:19:21,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3300913.3333333335, ans=0.125 2023-11-26 08:19:31,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3300913.3333333335, ans=0.2 2023-11-26 08:19:35,785 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-26 08:19:44,898 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:19:51,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.871e+01 9.357e+01 1.023e+02 1.211e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:19:55,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3301113.3333333335, ans=0.125 2023-11-26 08:19:57,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3301113.3333333335, ans=0.125 2023-11-26 08:20:01,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-26 08:20:06,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3301180.0, ans=0.125 2023-11-26 08:20:07,197 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2200, loss[loss=0.0602, simple_loss=0.08528, pruned_loss=0.01002, audio_tagging_loss=0.007535, over 15228.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08902, pruned_loss=0.01221, audio_tagging_loss=0.008731, over 3039886.61 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:20:16,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301180.0, ans=0.1 2023-11-26 08:20:22,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3301246.6666666665, ans=0.125 2023-11-26 08:20:26,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=12.0 2023-11-26 08:20:27,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3301246.6666666665, ans=0.2 2023-11-26 08:20:28,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301246.6666666665, ans=0.1 2023-11-26 08:20:31,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-26 08:20:38,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3301313.3333333335, ans=0.1 2023-11-26 08:20:49,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3301380.0, ans=0.125 2023-11-26 08:21:02,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3301446.6666666665, ans=0.2 2023-11-26 08:21:04,553 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2250, loss[loss=0.07176, simple_loss=0.1033, pruned_loss=0.01322, audio_tagging_loss=0.006879, over 14890.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0896, pruned_loss=0.01222, audio_tagging_loss=0.008742, over 3041127.33 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:21:06,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3301513.3333333335, ans=0.125 2023-11-26 08:21:07,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-26 08:21:27,599 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-26 08:21:29,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3301646.6666666665, ans=0.2 2023-11-26 08:21:40,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3301713.3333333335, ans=0.125 2023-11-26 08:21:43,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.823e+01 9.427e+01 1.035e+02 1.716e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:21:56,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2023-11-26 08:22:00,182 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2300, loss[loss=0.07434, simple_loss=0.09841, pruned_loss=0.01575, audio_tagging_loss=0.009393, over 15537.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09005, pruned_loss=0.01234, audio_tagging_loss=0.008689, over 3049116.44 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:22:17,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3301913.3333333335, ans=0.125 2023-11-26 08:22:21,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3301980.0, ans=0.125 2023-11-26 08:22:23,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-26 08:22:27,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3301980.0, ans=0.0 2023-11-26 08:22:31,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3301980.0, ans=0.07 2023-11-26 08:22:40,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-26 08:22:48,096 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:22:50,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3302113.3333333335, ans=0.125 2023-11-26 08:22:54,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3302180.0, ans=0.0 2023-11-26 08:22:55,591 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2350, loss[loss=0.07349, simple_loss=0.1053, pruned_loss=0.01222, audio_tagging_loss=0.0086, over 15298.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09023, pruned_loss=0.01236, audio_tagging_loss=0.008738, over 3047634.98 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:22:55,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3302180.0, ans=0.0 2023-11-26 08:23:00,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3302180.0, ans=0.0 2023-11-26 08:23:09,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3302246.6666666665, ans=0.2 2023-11-26 08:23:11,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3302246.6666666665, ans=0.125 2023-11-26 08:23:20,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-26 08:23:20,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302313.3333333335, ans=0.1 2023-11-26 08:23:25,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.18 vs. limit=15.0 2023-11-26 08:23:25,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-26 08:23:34,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.737e+01 9.480e+01 1.014e+02 1.457e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 08:23:37,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3302380.0, ans=0.125 2023-11-26 08:23:51,967 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2400, loss[loss=0.08325, simple_loss=0.117, pruned_loss=0.01744, audio_tagging_loss=0.007311, over 14138.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09092, pruned_loss=0.01265, audio_tagging_loss=0.008754, over 3047534.65 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:23:54,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2023-11-26 08:23:58,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-11-26 08:23:59,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3302513.3333333335, ans=0.125 2023-11-26 08:24:15,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-26 08:24:48,737 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2450, loss[loss=0.05289, simple_loss=0.07209, pruned_loss=0.006194, audio_tagging_loss=0.01065, over 15330.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09079, pruned_loss=0.01268, audio_tagging_loss=0.008878, over 3044506.37 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:24:50,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3302846.6666666665, ans=0.0 2023-11-26 08:24:50,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-11-26 08:24:53,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3302846.6666666665, ans=0.125 2023-11-26 08:24:53,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302846.6666666665, ans=0.1 2023-11-26 08:25:00,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3302913.3333333335, ans=0.1 2023-11-26 08:25:11,595 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-26 08:25:19,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3302980.0, ans=0.125 2023-11-26 08:25:27,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3303046.6666666665, ans=0.0 2023-11-26 08:25:28,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.728e+01 9.406e+01 1.027e+02 1.574e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:25:43,715 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2500, loss[loss=0.07171, simple_loss=0.0983, pruned_loss=0.01285, audio_tagging_loss=0.009712, over 15362.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09103, pruned_loss=0.01257, audio_tagging_loss=0.008965, over 3044843.12 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:25:56,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3303246.6666666665, ans=15.0 2023-11-26 08:26:00,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-11-26 08:26:07,798 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-26 08:26:12,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3303313.3333333335, ans=0.0 2023-11-26 08:26:15,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3303313.3333333335, ans=0.0 2023-11-26 08:26:24,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3303380.0, ans=0.125 2023-11-26 08:26:34,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3303446.6666666665, ans=0.125 2023-11-26 08:26:38,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3303513.3333333335, ans=0.0 2023-11-26 08:26:39,655 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2550, loss[loss=0.06476, simple_loss=0.08734, pruned_loss=0.01166, audio_tagging_loss=0.009431, over 14798.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0904, pruned_loss=0.01244, audio_tagging_loss=0.008836, over 3038786.15 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:27:03,214 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-26 08:27:07,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3303646.6666666665, ans=0.2 2023-11-26 08:27:15,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-11-26 08:27:19,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.568e+01 9.109e+01 1.007e+02 1.472e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 08:27:35,777 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2600, loss[loss=0.04449, simple_loss=0.05956, pruned_loss=0.005312, audio_tagging_loss=0.009392, over 15542.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08861, pruned_loss=0.01211, audio_tagging_loss=0.008781, over 3036241.50 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:27:46,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3303913.3333333335, ans=15.0 2023-11-26 08:27:53,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-11-26 08:27:58,819 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-26 08:28:06,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3303980.0, ans=0.1 2023-11-26 08:28:08,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-11-26 08:28:26,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3304113.3333333335, ans=0.125 2023-11-26 08:28:30,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3304180.0, ans=0.125 2023-11-26 08:28:31,471 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2650, loss[loss=0.04816, simple_loss=0.0695, pruned_loss=0.005373, audio_tagging_loss=0.008037, over 15765.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08865, pruned_loss=0.01212, audio_tagging_loss=0.008739, over 3035981.74 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:28:41,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3304246.6666666665, ans=0.2 2023-11-26 08:28:42,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3304246.6666666665, ans=0.125 2023-11-26 08:28:48,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3304246.6666666665, ans=0.125 2023-11-26 08:28:49,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3304246.6666666665, ans=0.0 2023-11-26 08:28:54,933 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-26 08:28:56,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3304313.3333333335, ans=0.125 2023-11-26 08:29:06,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-26 08:29:11,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3304380.0, ans=0.125 2023-11-26 08:29:11,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3304380.0, ans=0.0 2023-11-26 08:29:11,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.718e+01 9.187e+01 9.929e+01 1.273e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:29:14,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3304380.0, ans=0.1 2023-11-26 08:29:26,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-26 08:29:27,435 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2700, loss[loss=0.04681, simple_loss=0.05874, pruned_loss=0.008523, audio_tagging_loss=0.008915, over 15061.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08816, pruned_loss=0.01198, audio_tagging_loss=0.008708, over 3044977.04 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:29:34,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3304513.3333333335, ans=0.125 2023-11-26 08:29:51,637 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-26 08:30:17,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3304780.0, ans=0.0 2023-11-26 08:30:19,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3304780.0, ans=0.125 2023-11-26 08:30:23,769 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2750, loss[loss=0.04491, simple_loss=0.05418, pruned_loss=0.006988, audio_tagging_loss=0.01084, over 16250.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08714, pruned_loss=0.01181, audio_tagging_loss=0.008792, over 3045258.07 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:30:26,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3304846.6666666665, ans=0.0 2023-11-26 08:30:43,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3304913.3333333335, ans=0.125 2023-11-26 08:30:46,675 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-26 08:31:03,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.804e+01 8.931e+01 9.557e+01 1.024e+02 1.484e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 08:31:08,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3305113.3333333335, ans=0.125 2023-11-26 08:31:10,386 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:31:19,455 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2800, loss[loss=0.04435, simple_loss=0.0565, pruned_loss=0.006159, audio_tagging_loss=0.009944, over 13318.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08803, pruned_loss=0.01193, audio_tagging_loss=0.008723, over 3038360.42 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:31:20,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2023-11-26 08:31:20,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3305180.0, ans=0.0 2023-11-26 08:31:22,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.85 vs. limit=6.0 2023-11-26 08:31:35,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3305246.6666666665, ans=0.0 2023-11-26 08:31:43,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-26 08:31:43,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3305313.3333333335, ans=0.125 2023-11-26 08:31:47,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3305313.3333333335, ans=0.09899494936611666 2023-11-26 08:31:55,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2023-11-26 08:32:09,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3305446.6666666665, ans=0.07 2023-11-26 08:32:15,819 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2850, loss[loss=0.07616, simple_loss=0.1023, pruned_loss=0.01581, audio_tagging_loss=0.009225, over 15921.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08809, pruned_loss=0.01197, audio_tagging_loss=0.00871, over 3045958.42 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:32:20,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2023-11-26 08:32:24,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3305513.3333333335, ans=0.04949747468305833 2023-11-26 08:32:28,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3305580.0, ans=0.125 2023-11-26 08:32:28,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-11-26 08:32:34,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3305580.0, ans=0.125 2023-11-26 08:32:35,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3305580.0, ans=0.125 2023-11-26 08:32:39,509 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-26 08:32:48,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3305713.3333333335, ans=0.125 2023-11-26 08:32:55,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.658e+01 9.306e+01 1.021e+02 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:33:11,936 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2900, loss[loss=0.08598, simple_loss=0.1176, pruned_loss=0.01914, audio_tagging_loss=0.008035, over 14197.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08938, pruned_loss=0.01214, audio_tagging_loss=0.008614, over 3041129.72 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:33:12,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3305846.6666666665, ans=0.125 2023-11-26 08:33:22,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-11-26 08:33:27,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3305913.3333333335, ans=0.1 2023-11-26 08:33:34,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3305980.0, ans=0.0 2023-11-26 08:33:35,377 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-26 08:33:44,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-11-26 08:33:54,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3306046.6666666665, ans=0.125 2023-11-26 08:34:06,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=22.5 2023-11-26 08:34:07,745 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2950, loss[loss=0.05059, simple_loss=0.07521, pruned_loss=0.006781, audio_tagging_loss=0.006206, over 16680.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08959, pruned_loss=0.01221, audio_tagging_loss=0.00859, over 3045250.67 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:34:20,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3306246.6666666665, ans=0.125 2023-11-26 08:34:29,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3306313.3333333335, ans=0.125 2023-11-26 08:34:31,579 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-26 08:34:41,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3306380.0, ans=0.125 2023-11-26 08:34:48,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.833e+01 9.371e+01 1.025e+02 1.490e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 08:34:57,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3306446.6666666665, ans=0.125 2023-11-26 08:35:03,557 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3000, loss[loss=0.07176, simple_loss=0.1011, pruned_loss=0.011, audio_tagging_loss=0.01018, over 15312.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08945, pruned_loss=0.01224, audio_tagging_loss=0.008766, over 3045201.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:35:03,558 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 08:35:36,312 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05776, simple_loss=0.05062, pruned_loss=0.005203, audio_tagging_loss=0.02725, over 4681554.00 frames. 2023-11-26 08:35:36,312 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 08:35:44,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3306513.3333333335, ans=0.0 2023-11-26 08:35:59,080 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-26 08:35:59,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3306646.6666666665, ans=0.2 2023-11-26 08:36:09,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-26 08:36:11,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3306713.3333333335, ans=0.2 2023-11-26 08:36:25,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3306780.0, ans=0.125 2023-11-26 08:36:33,632 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3050, loss[loss=0.04332, simple_loss=0.05181, pruned_loss=0.00554, audio_tagging_loss=0.01187, over 14687.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08987, pruned_loss=0.01238, audio_tagging_loss=0.008817, over 3051267.56 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:36:40,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-11-26 08:36:57,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-26 08:37:05,398 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:37:08,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3307046.6666666665, ans=0.125 2023-11-26 08:37:13,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.651e+01 9.305e+01 1.008e+02 1.239e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:37:29,290 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3100, loss[loss=0.07483, simple_loss=0.1019, pruned_loss=0.015, audio_tagging_loss=0.008902, over 16071.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09167, pruned_loss=0.01266, audio_tagging_loss=0.008775, over 3049378.08 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:37:35,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-26 08:37:53,477 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-26 08:38:03,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3307380.0, ans=0.0 2023-11-26 08:38:08,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3307380.0, ans=0.125 2023-11-26 08:38:20,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3307446.6666666665, ans=0.2 2023-11-26 08:38:25,835 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3150, loss[loss=0.08698, simple_loss=0.1201, pruned_loss=0.01721, audio_tagging_loss=0.009709, over 15780.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09071, pruned_loss=0.01253, audio_tagging_loss=0.008962, over 3051721.05 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:38:49,553 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-26 08:38:50,790 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:39:06,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 8.861e+01 9.326e+01 1.004e+02 1.383e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 08:39:11,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2023-11-26 08:39:20,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3307780.0, ans=0.125 2023-11-26 08:39:22,073 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3200, loss[loss=0.06818, simple_loss=0.0949, pruned_loss=0.0128, audio_tagging_loss=0.007933, over 16506.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09082, pruned_loss=0.01253, audio_tagging_loss=0.008982, over 3049651.51 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:39:26,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3307846.6666666665, ans=0.125 2023-11-26 08:39:39,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3307913.3333333335, ans=0.0 2023-11-26 08:39:40,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307913.3333333335, ans=0.1 2023-11-26 08:39:45,629 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-26 08:39:52,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3307980.0, ans=0.1 2023-11-26 08:39:57,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2023-11-26 08:40:18,472 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3250, loss[loss=0.05771, simple_loss=0.08683, pruned_loss=0.007501, audio_tagging_loss=0.006801, over 14796.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09041, pruned_loss=0.01248, audio_tagging_loss=0.009051, over 3050483.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:40:31,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3308246.6666666665, ans=0.04949747468305833 2023-11-26 08:40:32,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3308246.6666666665, ans=0.1 2023-11-26 08:40:38,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-26 08:40:42,217 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-26 08:40:42,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=12.0 2023-11-26 08:40:58,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.751e+01 9.386e+01 1.020e+02 1.370e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 08:41:14,548 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3300, loss[loss=0.06674, simple_loss=0.09198, pruned_loss=0.01118, audio_tagging_loss=0.009574, over 15776.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09046, pruned_loss=0.0125, audio_tagging_loss=0.009046, over 3055104.17 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:41:34,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3308580.0, ans=0.1 2023-11-26 08:41:37,469 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-26 08:42:07,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.37 vs. limit=10.0 2023-11-26 08:42:10,609 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3350, loss[loss=0.06915, simple_loss=0.09198, pruned_loss=0.01415, audio_tagging_loss=0.00901, over 15318.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09083, pruned_loss=0.01257, audio_tagging_loss=0.008977, over 3060087.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:42:17,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3308846.6666666665, ans=0.125 2023-11-26 08:42:27,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3308913.3333333335, ans=0.1 2023-11-26 08:42:33,885 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-26 08:42:35,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3308980.0, ans=0.0 2023-11-26 08:42:50,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.810e+01 9.666e+01 1.064e+02 1.433e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 08:43:01,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3309113.3333333335, ans=0.07 2023-11-26 08:43:05,519 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3400, loss[loss=0.06126, simple_loss=0.07889, pruned_loss=0.01038, audio_tagging_loss=0.01144, over 15669.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09035, pruned_loss=0.01255, audio_tagging_loss=0.008837, over 3057715.46 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:43:16,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3309246.6666666665, ans=0.0 2023-11-26 08:43:20,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2023-11-26 08:43:28,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3309313.3333333335, ans=0.0 2023-11-26 08:43:29,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-26 08:43:48,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309446.6666666665, ans=0.1 2023-11-26 08:43:59,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309446.6666666665, ans=0.1 2023-11-26 08:44:01,840 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3450, loss[loss=0.06345, simple_loss=0.09158, pruned_loss=0.01174, audio_tagging_loss=0.005917, over 14803.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09042, pruned_loss=0.0126, audio_tagging_loss=0.008684, over 3054958.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:44:21,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3309580.0, ans=0.0 2023-11-26 08:44:24,916 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-26 08:44:28,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309646.6666666665, ans=0.1 2023-11-26 08:44:41,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.832e+01 9.547e+01 1.006e+02 1.211e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 08:44:57,763 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3500, loss[loss=0.05061, simple_loss=0.06822, pruned_loss=0.008047, audio_tagging_loss=0.008459, over 16789.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.0907, pruned_loss=0.01275, audio_tagging_loss=0.008641, over 3052472.49 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:45:02,284 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:45:05,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3309846.6666666665, ans=0.0 2023-11-26 08:45:20,665 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-26 08:45:25,961 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:45:53,132 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3550, loss[loss=0.06202, simple_loss=0.08318, pruned_loss=0.0117, audio_tagging_loss=0.008724, over 14956.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08973, pruned_loss=0.01248, audio_tagging_loss=0.008731, over 3052599.40 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:45:58,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-26 08:46:16,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-26 08:46:33,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.432e+01 9.183e+01 9.736e+01 1.809e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:46:48,259 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3600, loss[loss=0.06176, simple_loss=0.07822, pruned_loss=0.01417, audio_tagging_loss=0.00848, over 14264.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08996, pruned_loss=0.01246, audio_tagging_loss=0.008723, over 3051803.44 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:46:57,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3310513.3333333335, ans=0.1 2023-11-26 08:47:08,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3310580.0, ans=0.0 2023-11-26 08:47:12,050 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-26 08:47:28,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3310713.3333333335, ans=0.2 2023-11-26 08:47:33,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-26 08:47:45,311 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3650, loss[loss=0.04865, simple_loss=0.06722, pruned_loss=0.006661, audio_tagging_loss=0.008375, over 15838.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09042, pruned_loss=0.0126, audio_tagging_loss=0.008676, over 3044959.54 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:47:51,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3310846.6666666665, ans=0.0 2023-11-26 08:48:05,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-26 08:48:05,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-26 08:48:08,435 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-26 08:48:08,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2023-11-26 08:48:11,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3310980.0, ans=0.125 2023-11-26 08:48:24,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3311046.6666666665, ans=0.125 2023-11-26 08:48:27,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3311046.6666666665, ans=0.125 2023-11-26 08:48:29,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.589e+01 9.068e+01 9.988e+01 1.098e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 08:48:33,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3311113.3333333335, ans=0.0 2023-11-26 08:48:34,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311113.3333333335, ans=0.1 2023-11-26 08:48:40,919 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3700, loss[loss=0.08089, simple_loss=0.1071, pruned_loss=0.01826, audio_tagging_loss=0.009076, over 16999.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08892, pruned_loss=0.01231, audio_tagging_loss=0.008727, over 3045475.05 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:48:57,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3311246.6666666665, ans=0.125 2023-11-26 08:49:01,168 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:49:04,823 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-26 08:49:07,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3311313.3333333335, ans=0.125 2023-11-26 08:49:12,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3311313.3333333335, ans=0.125 2023-11-26 08:49:21,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3311380.0, ans=0.125 2023-11-26 08:49:26,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3311446.6666666665, ans=0.2 2023-11-26 08:49:26,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3311446.6666666665, ans=0.125 2023-11-26 08:49:36,481 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3750, loss[loss=0.0631, simple_loss=0.08837, pruned_loss=0.009772, audio_tagging_loss=0.009139, over 15040.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08918, pruned_loss=0.01238, audio_tagging_loss=0.008751, over 3053719.93 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:49:51,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3311580.0, ans=0.04949747468305833 2023-11-26 08:50:00,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-26 08:50:02,093 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:50:04,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311646.6666666665, ans=0.1 2023-11-26 08:50:06,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-26 08:50:13,656 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:50:20,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.835e+01 9.456e+01 1.002e+02 1.375e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:50:33,740 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3800, loss[loss=0.08105, simple_loss=0.1121, pruned_loss=0.01864, audio_tagging_loss=0.006344, over 15196.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09018, pruned_loss=0.01262, audio_tagging_loss=0.008803, over 3054906.38 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:50:41,489 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:50:51,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-26 08:50:56,206 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-26 08:51:06,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3312046.6666666665, ans=0.2 2023-11-26 08:51:11,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3312046.6666666665, ans=0.125 2023-11-26 08:51:29,053 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3850, loss[loss=0.04563, simple_loss=0.05962, pruned_loss=0.006968, audio_tagging_loss=0.008846, over 16572.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08943, pruned_loss=0.01241, audio_tagging_loss=0.00895, over 3052601.29 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:51:32,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3312180.0, ans=0.2 2023-11-26 08:51:46,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3312246.6666666665, ans=0.2 2023-11-26 08:51:52,571 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-26 08:52:06,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3312380.0, ans=0.0 2023-11-26 08:52:12,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.753e+01 9.436e+01 1.032e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:52:24,966 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3900, loss[loss=0.06319, simple_loss=0.08922, pruned_loss=0.01072, audio_tagging_loss=0.007858, over 15267.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08945, pruned_loss=0.01244, audio_tagging_loss=0.008902, over 3042223.63 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:52:34,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3312513.3333333335, ans=0.04949747468305833 2023-11-26 08:52:34,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2023-11-26 08:52:46,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2023-11-26 08:52:49,033 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-26 08:52:53,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3312646.6666666665, ans=0.125 2023-11-26 08:52:56,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2023-11-26 08:53:21,511 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3950, loss[loss=0.09109, simple_loss=0.1197, pruned_loss=0.02254, audio_tagging_loss=0.008691, over 15595.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08903, pruned_loss=0.01242, audio_tagging_loss=0.009064, over 3031204.07 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:53:23,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-26 08:53:30,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-26 08:53:40,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3312913.3333333335, ans=0.95 2023-11-26 08:53:44,604 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-26 08:53:57,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3313046.6666666665, ans=0.1 2023-11-26 08:54:05,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.614e+01 9.558e+01 1.040e+02 1.255e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 08:54:06,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3313113.3333333335, ans=0.05 2023-11-26 08:54:07,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3313113.3333333335, ans=0.125 2023-11-26 08:54:17,372 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4000, loss[loss=0.04665, simple_loss=0.05766, pruned_loss=0.005301, audio_tagging_loss=0.01252, over 14757.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08856, pruned_loss=0.01237, audio_tagging_loss=0.009118, over 3038103.19 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:54:41,100 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-26 08:54:42,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3313313.3333333335, ans=0.0 2023-11-26 08:54:43,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3313313.3333333335, ans=0.04949747468305833 2023-11-26 08:55:13,080 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4050, loss[loss=0.06672, simple_loss=0.09461, pruned_loss=0.01088, audio_tagging_loss=0.008534, over 15607.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08939, pruned_loss=0.01239, audio_tagging_loss=0.00909, over 3041362.03 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:55:14,670 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:55:32,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3313580.0, ans=0.125 2023-11-26 08:55:37,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-26 08:55:38,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-26 08:55:50,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3313713.3333333335, ans=0.0 2023-11-26 08:55:57,328 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.799e+01 9.457e+01 1.021e+02 1.367e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:55:57,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3313780.0, ans=0.125 2023-11-26 08:56:04,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-26 08:56:05,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3313780.0, ans=0.125 2023-11-26 08:56:09,658 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4100, loss[loss=0.06033, simple_loss=0.08713, pruned_loss=0.008817, audio_tagging_loss=0.007944, over 14152.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08972, pruned_loss=0.01237, audio_tagging_loss=0.009018, over 3040449.80 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:56:12,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3313846.6666666665, ans=0.0 2023-11-26 08:56:33,315 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-26 08:56:36,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3313980.0, ans=0.04949747468305833 2023-11-26 08:56:39,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3313980.0, ans=0.125 2023-11-26 08:56:52,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3314046.6666666665, ans=0.0 2023-11-26 08:57:05,854 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4150, loss[loss=0.07221, simple_loss=0.1005, pruned_loss=0.01368, audio_tagging_loss=0.008265, over 15394.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09005, pruned_loss=0.01241, audio_tagging_loss=0.00886, over 3048460.99 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:57:06,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=22.5 2023-11-26 08:57:18,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3314246.6666666665, ans=0.125 2023-11-26 08:57:22,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-26 08:57:29,897 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-26 08:57:31,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3314313.3333333335, ans=0.07 2023-11-26 08:57:41,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3314380.0, ans=0.125 2023-11-26 08:57:45,164 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:57:49,402 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.533e+01 8.973e+01 9.473e+01 1.014e+02 1.383e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 08:57:51,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3314446.6666666665, ans=0.125 2023-11-26 08:57:53,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3314446.6666666665, ans=0.0 2023-11-26 08:57:59,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3314446.6666666665, ans=0.125 2023-11-26 08:57:59,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3314446.6666666665, ans=0.1 2023-11-26 08:58:01,786 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4200, loss[loss=0.07451, simple_loss=0.09601, pruned_loss=0.01506, audio_tagging_loss=0.01145, over 15216.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09027, pruned_loss=0.01251, audio_tagging_loss=0.008707, over 3043450.63 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:58:06,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3314513.3333333335, ans=0.07 2023-11-26 08:58:25,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-26 08:58:47,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3314780.0, ans=0.125 2023-11-26 08:58:49,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3314780.0, ans=0.125 2023-11-26 08:58:50,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3314780.0, ans=0.0 2023-11-26 08:58:50,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2023-11-26 08:58:58,270 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4250, loss[loss=0.08215, simple_loss=0.1092, pruned_loss=0.02016, audio_tagging_loss=0.007409, over 14810.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09062, pruned_loss=0.01262, audio_tagging_loss=0.008758, over 3039047.30 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:59:07,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3314846.6666666665, ans=0.125 2023-11-26 08:59:14,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3314913.3333333335, ans=0.0 2023-11-26 08:59:15,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3314913.3333333335, ans=0.125 2023-11-26 08:59:19,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-26 08:59:21,167 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-26 08:59:25,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3314980.0, ans=0.125 2023-11-26 08:59:28,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3314980.0, ans=0.125 2023-11-26 08:59:40,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3315046.6666666665, ans=0.125 2023-11-26 08:59:41,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.666e+01 9.281e+01 9.909e+01 1.116e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 08:59:51,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2023-11-26 08:59:54,039 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4300, loss[loss=0.06827, simple_loss=0.08876, pruned_loss=0.01197, audio_tagging_loss=0.01192, over 15177.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09029, pruned_loss=0.0126, audio_tagging_loss=0.008746, over 3046965.53 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:59:57,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315180.0, ans=0.1 2023-11-26 08:59:57,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-26 09:00:07,563 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:00:17,446 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-26 09:00:24,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3315313.3333333335, ans=0.125 2023-11-26 09:00:48,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3315513.3333333335, ans=0.09899494936611666 2023-11-26 09:00:49,353 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4350, loss[loss=0.07084, simple_loss=0.09597, pruned_loss=0.01253, audio_tagging_loss=0.01032, over 14392.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09138, pruned_loss=0.01269, audio_tagging_loss=0.008677, over 3050358.66 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:00:54,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3315513.3333333335, ans=0.1 2023-11-26 09:00:58,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3315513.3333333335, ans=0.125 2023-11-26 09:01:05,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3315580.0, ans=0.0 2023-11-26 09:01:14,002 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-26 09:01:33,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.733e+01 9.373e+01 9.862e+01 1.351e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:01:33,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3315780.0, ans=0.0 2023-11-26 09:01:35,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3315780.0, ans=0.5 2023-11-26 09:01:46,596 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4400, loss[loss=0.05952, simple_loss=0.08405, pruned_loss=0.00956, audio_tagging_loss=0.007939, over 15306.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09073, pruned_loss=0.0126, audio_tagging_loss=0.008627, over 3050110.45 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:02:04,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3315913.3333333335, ans=0.125 2023-11-26 09:02:05,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3315913.3333333335, ans=15.0 2023-11-26 09:02:09,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-26 09:02:10,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3315980.0, ans=0.0 2023-11-26 09:02:27,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3316046.6666666665, ans=0.04949747468305833 2023-11-26 09:02:42,686 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4450, loss[loss=0.08686, simple_loss=0.1283, pruned_loss=0.01951, audio_tagging_loss=0.003218, over 14476.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0906, pruned_loss=0.01246, audio_tagging_loss=0.008607, over 3050460.72 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:02:49,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-26 09:02:52,547 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:02:54,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3316246.6666666665, ans=0.07 2023-11-26 09:02:59,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3316246.6666666665, ans=0.0 2023-11-26 09:03:06,113 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-26 09:03:14,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-26 09:03:17,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3316380.0, ans=0.09899494936611666 2023-11-26 09:03:26,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 8.913e+01 9.793e+01 1.057e+02 1.326e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 09:03:27,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3316446.6666666665, ans=0.07 2023-11-26 09:03:29,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3316446.6666666665, ans=0.09899494936611666 2023-11-26 09:03:37,887 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4500, loss[loss=0.05818, simple_loss=0.08379, pruned_loss=0.01029, audio_tagging_loss=0.005997, over 16105.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09102, pruned_loss=0.01263, audio_tagging_loss=0.008655, over 3053475.20 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:03:40,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316513.3333333335, ans=0.1 2023-11-26 09:04:02,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-26 09:04:11,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3316713.3333333335, ans=0.125 2023-11-26 09:04:27,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3316780.0, ans=0.2 2023-11-26 09:04:34,323 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4550, loss[loss=0.0691, simple_loss=0.09993, pruned_loss=0.01024, audio_tagging_loss=0.008894, over 15393.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08969, pruned_loss=0.01239, audio_tagging_loss=0.008701, over 3045598.57 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:04:40,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3316846.6666666665, ans=0.125 2023-11-26 09:04:41,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3316846.6666666665, ans=0.125 2023-11-26 09:04:52,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-11-26 09:04:55,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3316980.0, ans=0.125 2023-11-26 09:04:57,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=12.0 2023-11-26 09:04:58,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-26 09:05:16,136 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:05:19,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.805e+01 9.410e+01 9.881e+01 1.547e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 09:05:27,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3317113.3333333335, ans=0.1 2023-11-26 09:05:31,126 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4600, loss[loss=0.05946, simple_loss=0.07621, pruned_loss=0.009828, audio_tagging_loss=0.01153, over 14291.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08894, pruned_loss=0.01228, audio_tagging_loss=0.008759, over 3044193.71 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:05:39,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-26 09:05:46,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2023-11-26 09:05:53,463 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-26 09:06:08,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3317380.0, ans=0.2 2023-11-26 09:06:15,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3317446.6666666665, ans=0.125 2023-11-26 09:06:27,044 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4650, loss[loss=0.07162, simple_loss=0.09785, pruned_loss=0.0155, audio_tagging_loss=0.007195, over 15529.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08951, pruned_loss=0.0123, audio_tagging_loss=0.008782, over 3046030.12 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:06:36,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3317580.0, ans=0.125 2023-11-26 09:06:43,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3317580.0, ans=0.125 2023-11-26 09:06:50,794 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-26 09:06:54,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3317646.6666666665, ans=0.2 2023-11-26 09:07:08,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-11-26 09:07:12,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.826e+01 9.427e+01 1.038e+02 1.331e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:07:19,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3317780.0, ans=0.125 2023-11-26 09:07:22,792 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4700, loss[loss=0.055, simple_loss=0.06312, pruned_loss=0.01263, audio_tagging_loss=0.0108, over 14910.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08927, pruned_loss=0.01232, audio_tagging_loss=0.008878, over 3045739.90 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:07:44,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2023-11-26 09:07:44,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2023-11-26 09:07:46,145 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-26 09:07:47,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2023-11-26 09:07:58,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3318046.6666666665, ans=0.04949747468305833 2023-11-26 09:08:10,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3318113.3333333335, ans=0.1 2023-11-26 09:08:18,965 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4750, loss[loss=0.07695, simple_loss=0.1003, pruned_loss=0.01829, audio_tagging_loss=0.008519, over 14591.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08965, pruned_loss=0.01235, audio_tagging_loss=0.008935, over 3044854.86 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:08:31,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3318246.6666666665, ans=0.125 2023-11-26 09:08:34,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-26 09:08:36,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3318246.6666666665, ans=0.125 2023-11-26 09:08:41,260 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-26 09:08:41,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3318313.3333333335, ans=0.1 2023-11-26 09:08:43,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3318313.3333333335, ans=0.125 2023-11-26 09:08:47,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3318313.3333333335, ans=0.09899494936611666 2023-11-26 09:09:04,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.590e+01 9.271e+01 9.941e+01 1.309e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:09:14,042 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4800, loss[loss=0.06006, simple_loss=0.07514, pruned_loss=0.01272, audio_tagging_loss=0.009772, over 13581.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08982, pruned_loss=0.01247, audio_tagging_loss=0.009028, over 3043538.46 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:09:16,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3318513.3333333335, ans=0.125 2023-11-26 09:09:31,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3318580.0, ans=0.125 2023-11-26 09:09:32,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3318580.0, ans=0.0 2023-11-26 09:09:37,492 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-26 09:09:37,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3318646.6666666665, ans=0.05 2023-11-26 09:09:43,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3318646.6666666665, ans=0.0 2023-11-26 09:09:44,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-26 09:09:54,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3318713.3333333335, ans=0.2 2023-11-26 09:10:00,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3318780.0, ans=0.125 2023-11-26 09:10:10,115 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4850, loss[loss=0.05421, simple_loss=0.07319, pruned_loss=0.009552, audio_tagging_loss=0.008066, over 14805.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08996, pruned_loss=0.0126, audio_tagging_loss=0.00908, over 3040787.22 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:10:33,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=15.0 2023-11-26 09:10:34,002 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-26 09:10:40,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-26 09:10:41,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-26 09:10:55,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.672e+01 9.359e+01 1.001e+02 1.200e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:11:05,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3319180.0, ans=0.125 2023-11-26 09:11:06,513 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4900, loss[loss=0.06104, simple_loss=0.08401, pruned_loss=0.008018, audio_tagging_loss=0.01102, over 16581.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09028, pruned_loss=0.01267, audio_tagging_loss=0.009072, over 3041365.12 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:11:09,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2023-11-26 09:11:17,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3319246.6666666665, ans=0.125 2023-11-26 09:11:28,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-26 09:11:33,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3319313.3333333335, ans=10.0 2023-11-26 09:11:43,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2023-11-26 09:12:01,623 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4950, loss[loss=0.0548, simple_loss=0.07836, pruned_loss=0.008441, audio_tagging_loss=0.007181, over 14873.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.0902, pruned_loss=0.01258, audio_tagging_loss=0.008816, over 3043373.96 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:12:08,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3319513.3333333335, ans=0.0 2023-11-26 09:12:16,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3319580.0, ans=0.125 2023-11-26 09:12:24,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-26 09:12:31,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3319646.6666666665, ans=0.125 2023-11-26 09:12:34,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3319713.3333333335, ans=0.1 2023-11-26 09:12:46,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.822e+01 9.528e+01 1.003e+02 1.233e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 09:12:54,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3319780.0, ans=0.125 2023-11-26 09:12:56,702 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5000, loss[loss=0.07035, simple_loss=0.08826, pruned_loss=0.01788, audio_tagging_loss=0.008343, over 16609.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.0908, pruned_loss=0.0128, audio_tagging_loss=0.008709, over 3049460.73 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:12:57,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-26 09:13:15,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3319913.3333333335, ans=0.125 2023-11-26 09:13:21,431 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-26 09:13:33,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3320046.6666666665, ans=0.0 2023-11-26 09:13:52,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3320180.0, ans=0.125 2023-11-26 09:13:53,749 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5050, loss[loss=0.06673, simple_loss=0.09502, pruned_loss=0.01207, audio_tagging_loss=0.007151, over 14867.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09106, pruned_loss=0.01273, audio_tagging_loss=0.008655, over 3053132.30 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:13:55,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2023-11-26 09:13:56,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2023-11-26 09:14:12,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2023-11-26 09:14:17,115 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-26 09:14:19,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3320313.3333333335, ans=0.0 2023-11-26 09:14:24,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-26 09:14:39,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.596e+01 9.338e+01 9.985e+01 1.178e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 09:14:42,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3320446.6666666665, ans=0.2 2023-11-26 09:14:49,961 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5100, loss[loss=0.07168, simple_loss=0.1002, pruned_loss=0.01291, audio_tagging_loss=0.008644, over 16947.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09056, pruned_loss=0.01267, audio_tagging_loss=0.008629, over 3051815.32 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:15:02,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3320580.0, ans=0.125 2023-11-26 09:15:05,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3320580.0, ans=0.0 2023-11-26 09:15:12,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-26 09:15:24,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2023-11-26 09:15:26,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-26 09:15:45,338 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5150, loss[loss=0.06054, simple_loss=0.07659, pruned_loss=0.01155, audio_tagging_loss=0.01069, over 15626.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08991, pruned_loss=0.01255, audio_tagging_loss=0.008664, over 3047919.51 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:16:09,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-26 09:16:22,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3321046.6666666665, ans=0.2 2023-11-26 09:16:23,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-11-26 09:16:31,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.833e+01 9.269e+01 1.033e+02 1.245e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:16:39,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3321113.3333333335, ans=0.0 2023-11-26 09:16:41,829 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5200, loss[loss=0.06298, simple_loss=0.08713, pruned_loss=0.01322, audio_tagging_loss=0.006196, over 15377.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09055, pruned_loss=0.01257, audio_tagging_loss=0.008585, over 3047719.58 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:17:03,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3321313.3333333335, ans=0.0 2023-11-26 09:17:05,250 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-26 09:17:22,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3321380.0, ans=0.0 2023-11-26 09:17:32,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3321446.6666666665, ans=0.2 2023-11-26 09:17:37,759 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5250, loss[loss=0.08371, simple_loss=0.1134, pruned_loss=0.0198, audio_tagging_loss=0.007188, over 16180.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09125, pruned_loss=0.01254, audio_tagging_loss=0.00846, over 3044126.70 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:17:41,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3321513.3333333335, ans=0.2 2023-11-26 09:17:52,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3321580.0, ans=0.0 2023-11-26 09:18:01,133 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-26 09:18:14,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2023-11-26 09:18:17,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3321713.3333333335, ans=0.0 2023-11-26 09:18:19,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3321713.3333333335, ans=0.0 2023-11-26 09:18:22,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3321780.0, ans=0.05 2023-11-26 09:18:22,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3321780.0, ans=0.0 2023-11-26 09:18:23,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.788e+01 9.359e+01 1.015e+02 1.795e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:18:33,631 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5300, loss[loss=0.08988, simple_loss=0.1313, pruned_loss=0.01614, audio_tagging_loss=0.008085, over 15019.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09191, pruned_loss=0.01258, audio_tagging_loss=0.008499, over 3043429.83 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:18:50,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2023-11-26 09:18:51,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3321913.3333333335, ans=0.1 2023-11-26 09:18:55,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3321980.0, ans=0.2 2023-11-26 09:18:57,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-26 09:19:01,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3321980.0, ans=0.125 2023-11-26 09:19:14,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3322046.6666666665, ans=0.125 2023-11-26 09:19:29,810 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5350, loss[loss=0.06649, simple_loss=0.09189, pruned_loss=0.01273, audio_tagging_loss=0.007812, over 15325.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09184, pruned_loss=0.01258, audio_tagging_loss=0.008542, over 3047731.56 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:19:40,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3322246.6666666665, ans=0.0 2023-11-26 09:19:45,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3322246.6666666665, ans=0.1 2023-11-26 09:19:52,701 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-26 09:20:00,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-26 09:20:01,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3322380.0, ans=0.1 2023-11-26 09:20:01,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3322380.0, ans=0.0 2023-11-26 09:20:14,674 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:20:16,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.636e+01 9.507e+01 1.021e+02 1.196e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 09:20:16,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3322446.6666666665, ans=0.2 2023-11-26 09:20:20,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-26 09:20:22,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3322446.6666666665, ans=0.125 2023-11-26 09:20:25,611 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5400, loss[loss=0.08307, simple_loss=0.112, pruned_loss=0.01963, audio_tagging_loss=0.007438, over 14808.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09299, pruned_loss=0.01296, audio_tagging_loss=0.008584, over 3047516.32 frames. ], batch size: 52, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:20:45,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3322580.0, ans=0.05 2023-11-26 09:20:48,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-26 09:20:55,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2023-11-26 09:21:00,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-26 09:21:01,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3322713.3333333335, ans=0.125 2023-11-26 09:21:05,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3322713.3333333335, ans=0.2 2023-11-26 09:21:07,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-26 09:21:14,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3322780.0, ans=0.125 2023-11-26 09:21:20,962 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5450, loss[loss=0.06177, simple_loss=0.08997, pruned_loss=0.008607, audio_tagging_loss=0.008172, over 15195.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09237, pruned_loss=0.01297, audio_tagging_loss=0.008678, over 3041872.18 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:21:30,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2023-11-26 09:21:35,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3322913.3333333335, ans=0.0 2023-11-26 09:21:37,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3322913.3333333335, ans=0.125 2023-11-26 09:21:45,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-26 09:21:49,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2023-11-26 09:21:50,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3322980.0, ans=0.125 2023-11-26 09:21:54,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3323046.6666666665, ans=0.2 2023-11-26 09:21:55,297 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:21:55,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-11-26 09:22:01,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3323046.6666666665, ans=0.0 2023-11-26 09:22:08,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.615e+01 9.390e+01 1.017e+02 1.312e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 09:22:09,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3323113.3333333335, ans=0.2 2023-11-26 09:22:17,207 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5500, loss[loss=0.05752, simple_loss=0.07446, pruned_loss=0.009532, audio_tagging_loss=0.01075, over 15718.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09197, pruned_loss=0.01293, audio_tagging_loss=0.00876, over 3049453.35 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:22:26,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3323180.0, ans=0.125 2023-11-26 09:22:26,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-26 09:22:40,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-26 09:22:54,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3323380.0, ans=0.025 2023-11-26 09:23:00,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3323380.0, ans=0.0 2023-11-26 09:23:13,562 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5550, loss[loss=0.06783, simple_loss=0.08905, pruned_loss=0.01215, audio_tagging_loss=0.01115, over 15604.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09222, pruned_loss=0.01293, audio_tagging_loss=0.008846, over 3049437.99 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:23:13,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3323513.3333333335, ans=0.125 2023-11-26 09:23:19,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3323513.3333333335, ans=0.125 2023-11-26 09:23:24,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-26 09:23:28,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-26 09:23:37,062 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-26 09:23:48,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3323713.3333333335, ans=0.1 2023-11-26 09:23:52,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3323713.3333333335, ans=0.125 2023-11-26 09:23:57,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323780.0, ans=0.1 2023-11-26 09:24:00,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 8.665e+01 9.160e+01 9.875e+01 1.167e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 09:24:01,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3323780.0, ans=0.0 2023-11-26 09:24:07,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3323846.6666666665, ans=0.1 2023-11-26 09:24:08,701 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5600, loss[loss=0.07439, simple_loss=0.09754, pruned_loss=0.01666, audio_tagging_loss=0.008962, over 16354.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09055, pruned_loss=0.01266, audio_tagging_loss=0.008984, over 3046422.43 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:24:32,117 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-26 09:24:39,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3323980.0, ans=0.125 2023-11-26 09:24:47,736 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:24:49,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3324046.6666666665, ans=0.0 2023-11-26 09:24:53,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3324113.3333333335, ans=0.0 2023-11-26 09:25:04,648 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5650, loss[loss=0.08147, simple_loss=0.111, pruned_loss=0.01849, audio_tagging_loss=0.007471, over 14112.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09003, pruned_loss=0.01263, audio_tagging_loss=0.008945, over 3051045.77 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:25:13,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3324180.0, ans=0.2 2023-11-26 09:25:28,001 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-26 09:25:29,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3324313.3333333335, ans=0.2 2023-11-26 09:25:30,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3324313.3333333335, ans=0.0 2023-11-26 09:25:36,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3324380.0, ans=0.125 2023-11-26 09:25:51,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.711e+01 9.394e+01 1.016e+02 1.261e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 09:25:57,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3324446.6666666665, ans=0.125 2023-11-26 09:25:58,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3324446.6666666665, ans=0.125 2023-11-26 09:26:00,616 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5700, loss[loss=0.06843, simple_loss=0.09248, pruned_loss=0.01481, audio_tagging_loss=0.007386, over 15018.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08899, pruned_loss=0.01249, audio_tagging_loss=0.008984, over 3047144.89 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:26:18,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3324580.0, ans=0.0 2023-11-26 09:26:22,927 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-26 09:26:30,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3324646.6666666665, ans=0.125 2023-11-26 09:26:46,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3324780.0, ans=0.125 2023-11-26 09:26:53,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-11-26 09:26:55,478 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5750, loss[loss=0.093, simple_loss=0.1272, pruned_loss=0.02352, audio_tagging_loss=0.005895, over 15457.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08922, pruned_loss=0.01255, audio_tagging_loss=0.008881, over 3049340.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:27:03,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2023-11-26 09:27:05,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3324913.3333333335, ans=0.0 2023-11-26 09:27:16,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3324913.3333333335, ans=0.0 2023-11-26 09:27:17,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=12.0 2023-11-26 09:27:18,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-26 09:27:19,373 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-26 09:27:30,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3325046.6666666665, ans=0.125 2023-11-26 09:27:43,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.688e+01 9.672e+01 1.037e+02 1.412e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 09:27:51,156 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5800, loss[loss=0.06254, simple_loss=0.09117, pruned_loss=0.007917, audio_tagging_loss=0.009035, over 16236.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08827, pruned_loss=0.01234, audio_tagging_loss=0.008851, over 3043894.29 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:28:15,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-26 09:28:28,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3325380.0, ans=0.2 2023-11-26 09:28:46,784 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5850, loss[loss=0.08212, simple_loss=0.1145, pruned_loss=0.01622, audio_tagging_loss=0.008659, over 14262.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08759, pruned_loss=0.01207, audio_tagging_loss=0.008884, over 3040994.77 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:28:48,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3325513.3333333335, ans=0.1 2023-11-26 09:28:51,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3325513.3333333335, ans=0.125 2023-11-26 09:28:59,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3325580.0, ans=0.125 2023-11-26 09:29:09,681 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-26 09:29:35,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.712e+01 9.460e+01 1.009e+02 5.552e+02, threshold=1.892e+02, percent-clipped=1.0 2023-11-26 09:29:42,433 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5900, loss[loss=0.06011, simple_loss=0.08522, pruned_loss=0.007827, audio_tagging_loss=0.009675, over 15446.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08836, pruned_loss=0.01203, audio_tagging_loss=0.008827, over 3043888.56 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:29:49,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3325846.6666666665, ans=0.025 2023-11-26 09:29:55,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3325913.3333333335, ans=0.0 2023-11-26 09:29:56,954 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:30:05,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-26 09:30:20,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-26 09:30:26,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-26 09:30:33,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3326113.3333333335, ans=22.5 2023-11-26 09:30:37,534 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5950, loss[loss=0.08089, simple_loss=0.1134, pruned_loss=0.01869, audio_tagging_loss=0.005506, over 15532.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09038, pruned_loss=0.01228, audio_tagging_loss=0.008624, over 3052790.20 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:30:40,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3326180.0, ans=0.09899494936611666 2023-11-26 09:30:42,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3326180.0, ans=0.125 2023-11-26 09:30:43,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3326180.0, ans=0.125 2023-11-26 09:30:49,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3326246.6666666665, ans=0.125 2023-11-26 09:30:53,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3326246.6666666665, ans=0.2 2023-11-26 09:30:59,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326246.6666666665, ans=0.1 2023-11-26 09:31:01,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-26 09:31:18,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326380.0, ans=0.1 2023-11-26 09:31:25,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.795e+01 9.324e+01 1.011e+02 1.404e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 09:31:33,850 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6000, loss[loss=0.04311, simple_loss=0.057, pruned_loss=0.006059, audio_tagging_loss=0.008557, over 14902.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09086, pruned_loss=0.01241, audio_tagging_loss=0.00862, over 3053897.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:31:33,851 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 09:32:06,553 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05807, simple_loss=0.05064, pruned_loss=0.005286, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 09:32:06,553 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 09:32:29,927 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-26 09:32:45,652 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:32:53,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3326780.0, ans=0.125 2023-11-26 09:32:54,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.97 vs. limit=22.5 2023-11-26 09:33:01,927 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6050, loss[loss=0.08605, simple_loss=0.1119, pruned_loss=0.01964, audio_tagging_loss=0.01048, over 16053.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0907, pruned_loss=0.01248, audio_tagging_loss=0.008528, over 3052354.78 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:33:14,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3326913.3333333335, ans=0.125 2023-11-26 09:33:25,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-26 09:33:44,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3327046.6666666665, ans=0.05 2023-11-26 09:33:44,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3327046.6666666665, ans=0.0 2023-11-26 09:33:49,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.694e+01 9.426e+01 1.019e+02 1.507e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:33:52,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3327113.3333333335, ans=0.0 2023-11-26 09:33:53,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3327113.3333333335, ans=0.035 2023-11-26 09:33:58,295 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6100, loss[loss=0.06203, simple_loss=0.0912, pruned_loss=0.009126, audio_tagging_loss=0.007302, over 15201.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09089, pruned_loss=0.01247, audio_tagging_loss=0.008582, over 3050184.80 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:34:03,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3327180.0, ans=0.125 2023-11-26 09:34:07,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3327180.0, ans=0.0 2023-11-26 09:34:09,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3327246.6666666665, ans=0.0 2023-11-26 09:34:21,294 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-26 09:34:38,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3327380.0, ans=0.2 2023-11-26 09:34:54,322 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6150, loss[loss=0.06046, simple_loss=0.07628, pruned_loss=0.01131, audio_tagging_loss=0.01101, over 14679.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08999, pruned_loss=0.01226, audio_tagging_loss=0.008745, over 3046323.22 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:34:57,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2023-11-26 09:35:17,632 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-26 09:35:34,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3327713.3333333335, ans=0.0 2023-11-26 09:35:35,796 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:35:42,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.630e+01 9.202e+01 1.002e+02 1.257e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 09:35:49,827 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6200, loss[loss=0.08429, simple_loss=0.1245, pruned_loss=0.01641, audio_tagging_loss=0.005637, over 15010.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09002, pruned_loss=0.01215, audio_tagging_loss=0.008719, over 3047608.70 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:36:06,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3327913.3333333335, ans=0.09899494936611666 2023-11-26 09:36:13,306 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-26 09:36:46,678 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6250, loss[loss=0.03885, simple_loss=0.05354, pruned_loss=0.003943, audio_tagging_loss=0.008142, over 16327.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09011, pruned_loss=0.01224, audio_tagging_loss=0.008836, over 3050287.85 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:37:08,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3328313.3333333335, ans=0.0 2023-11-26 09:37:09,557 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-26 09:37:31,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3328446.6666666665, ans=0.0 2023-11-26 09:37:31,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-26 09:37:36,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.808e+01 9.356e+01 1.010e+02 1.714e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 09:37:38,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3328446.6666666665, ans=0.125 2023-11-26 09:37:39,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3328446.6666666665, ans=0.125 2023-11-26 09:37:42,476 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6300, loss[loss=0.07097, simple_loss=0.09353, pruned_loss=0.01343, audio_tagging_loss=0.01077, over 15859.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08926, pruned_loss=0.01218, audio_tagging_loss=0.008935, over 3052430.04 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:37:53,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3328580.0, ans=0.1 2023-11-26 09:38:00,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3328580.0, ans=0.125 2023-11-26 09:38:03,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3328646.6666666665, ans=0.125 2023-11-26 09:38:05,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-26 09:38:19,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3328713.3333333335, ans=0.0 2023-11-26 09:38:22,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=12.0 2023-11-26 09:38:30,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3328780.0, ans=0.1 2023-11-26 09:38:37,433 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6350, loss[loss=0.04429, simple_loss=0.05183, pruned_loss=0.005816, audio_tagging_loss=0.01256, over 14256.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08921, pruned_loss=0.01209, audio_tagging_loss=0.008996, over 3049048.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:38:47,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-26 09:38:57,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3328913.3333333335, ans=0.0 2023-11-26 09:38:58,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3328913.3333333335, ans=0.0 2023-11-26 09:39:01,251 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-26 09:39:02,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328980.0, ans=0.1 2023-11-26 09:39:06,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3328980.0, ans=0.0 2023-11-26 09:39:20,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3329046.6666666665, ans=0.2 2023-11-26 09:39:21,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3329113.3333333335, ans=0.125 2023-11-26 09:39:27,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.815e+01 9.462e+01 1.012e+02 1.581e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 09:39:33,589 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6400, loss[loss=0.06982, simple_loss=0.0937, pruned_loss=0.01276, audio_tagging_loss=0.01021, over 15673.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08894, pruned_loss=0.01211, audio_tagging_loss=0.009167, over 3041516.57 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:39:56,835 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-26 09:40:10,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.78 vs. limit=10.0 2023-11-26 09:40:20,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3329446.6666666665, ans=0.0 2023-11-26 09:40:29,270 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6450, loss[loss=0.0557, simple_loss=0.07464, pruned_loss=0.007272, audio_tagging_loss=0.01111, over 15807.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08838, pruned_loss=0.01193, audio_tagging_loss=0.009192, over 3044681.04 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:40:29,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329513.3333333335, ans=0.1 2023-11-26 09:40:43,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3329580.0, ans=0.0 2023-11-26 09:40:51,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3329646.6666666665, ans=0.125 2023-11-26 09:40:52,753 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-26 09:41:08,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3329713.3333333335, ans=0.125 2023-11-26 09:41:19,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.730e+01 9.364e+01 9.937e+01 1.364e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:41:25,100 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6500, loss[loss=0.07547, simple_loss=0.1115, pruned_loss=0.01318, audio_tagging_loss=0.006563, over 15232.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08874, pruned_loss=0.01201, audio_tagging_loss=0.009144, over 3045722.51 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:41:34,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2023-11-26 09:41:43,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-11-26 09:41:48,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-26 09:41:50,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3329980.0, ans=0.5 2023-11-26 09:41:51,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2023-11-26 09:41:53,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3329980.0, ans=0.0 2023-11-26 09:42:20,847 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6550, loss[loss=0.07027, simple_loss=0.09493, pruned_loss=0.01535, audio_tagging_loss=0.007457, over 15627.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08891, pruned_loss=0.01208, audio_tagging_loss=0.009004, over 3045247.25 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:42:36,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2023-11-26 09:42:43,938 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-26 09:42:44,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3330313.3333333335, ans=0.125 2023-11-26 09:42:54,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3330380.0, ans=0.1 2023-11-26 09:42:59,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3330380.0, ans=0.2 2023-11-26 09:43:01,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-26 09:43:11,436 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.534e+01 9.204e+01 9.909e+01 1.481e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 09:43:11,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3330446.6666666665, ans=0.125 2023-11-26 09:43:11,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330446.6666666665, ans=0.1 2023-11-26 09:43:15,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3330513.3333333335, ans=0.0 2023-11-26 09:43:16,670 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6600, loss[loss=0.05946, simple_loss=0.07865, pruned_loss=0.01149, audio_tagging_loss=0.008645, over 15511.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08847, pruned_loss=0.01212, audio_tagging_loss=0.008815, over 3041169.02 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:43:19,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3330513.3333333335, ans=0.125 2023-11-26 09:43:31,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3330580.0, ans=15.0 2023-11-26 09:43:39,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-26 09:43:44,607 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:43:53,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3330713.3333333335, ans=0.125 2023-11-26 09:44:04,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3330780.0, ans=0.125 2023-11-26 09:44:07,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3330780.0, ans=0.125 2023-11-26 09:44:11,848 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6650, loss[loss=0.0681, simple_loss=0.09514, pruned_loss=0.01172, audio_tagging_loss=0.00881, over 14669.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08893, pruned_loss=0.01233, audio_tagging_loss=0.008822, over 3038701.53 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:44:36,717 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-26 09:45:02,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.692e+01 9.274e+01 1.004e+02 1.194e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 09:45:04,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-26 09:45:07,937 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6700, loss[loss=0.06795, simple_loss=0.09178, pruned_loss=0.01332, audio_tagging_loss=0.008739, over 14720.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08927, pruned_loss=0.0122, audio_tagging_loss=0.008727, over 3043410.85 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:45:19,752 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:45:31,197 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-26 09:45:34,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3331313.3333333335, ans=0.1 2023-11-26 09:45:38,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3331313.3333333335, ans=0.0 2023-11-26 09:45:38,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3331313.3333333335, ans=10.0 2023-11-26 09:45:54,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3331446.6666666665, ans=0.125 2023-11-26 09:46:04,045 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6750, loss[loss=0.07131, simple_loss=0.09781, pruned_loss=0.01346, audio_tagging_loss=0.008951, over 15144.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08837, pruned_loss=0.01207, audio_tagging_loss=0.008788, over 3032219.51 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:46:05,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3331513.3333333335, ans=0.1 2023-11-26 09:46:26,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-26 09:46:27,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3331646.6666666665, ans=0.05 2023-11-26 09:46:37,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3331713.3333333335, ans=0.125 2023-11-26 09:46:50,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-26 09:46:53,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.795e+01 9.376e+01 1.027e+02 1.751e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:46:54,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-26 09:46:55,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3331780.0, ans=0.125 2023-11-26 09:46:59,205 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6800, loss[loss=0.05547, simple_loss=0.07384, pruned_loss=0.008301, audio_tagging_loss=0.01025, over 15918.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08912, pruned_loss=0.01209, audio_tagging_loss=0.00864, over 3037614.02 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:47:12,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3331913.3333333335, ans=0.2 2023-11-26 09:47:15,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3331913.3333333335, ans=0.125 2023-11-26 09:47:18,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3331913.3333333335, ans=0.0 2023-11-26 09:47:22,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3331980.0, ans=0.0 2023-11-26 09:47:23,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-26 09:47:28,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3331980.0, ans=0.125 2023-11-26 09:47:28,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3331980.0, ans=0.2 2023-11-26 09:47:51,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3332113.3333333335, ans=0.125 2023-11-26 09:47:55,090 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6850, loss[loss=0.08513, simple_loss=0.1281, pruned_loss=0.01635, audio_tagging_loss=0.004721, over 16052.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08957, pruned_loss=0.01225, audio_tagging_loss=0.008608, over 3035910.16 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:47:55,303 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:47:59,057 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:48:07,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3332246.6666666665, ans=0.125 2023-11-26 09:48:16,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3332313.3333333335, ans=0.125 2023-11-26 09:48:18,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-26 09:48:19,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3332313.3333333335, ans=0.2 2023-11-26 09:48:37,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3332380.0, ans=0.125 2023-11-26 09:48:38,675 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:48:43,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3332446.6666666665, ans=0.125 2023-11-26 09:48:43,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3332446.6666666665, ans=0.125 2023-11-26 09:48:45,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.735e+01 9.366e+01 1.004e+02 1.286e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:48:51,086 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6900, loss[loss=0.0689, simple_loss=0.09078, pruned_loss=0.01229, audio_tagging_loss=0.01123, over 14363.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09006, pruned_loss=0.01225, audio_tagging_loss=0.00859, over 3028103.27 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:48:56,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3332513.3333333335, ans=0.125 2023-11-26 09:48:58,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3332513.3333333335, ans=0.2 2023-11-26 09:49:03,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3332580.0, ans=0.0 2023-11-26 09:49:03,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-26 09:49:09,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3332580.0, ans=0.125 2023-11-26 09:49:13,457 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-26 09:49:28,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3332713.3333333335, ans=0.0 2023-11-26 09:49:33,073 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:49:36,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3332780.0, ans=0.125 2023-11-26 09:49:43,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3332780.0, ans=0.125 2023-11-26 09:49:45,798 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6950, loss[loss=0.06796, simple_loss=0.09163, pruned_loss=0.01172, audio_tagging_loss=0.01041, over 15565.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09043, pruned_loss=0.01235, audio_tagging_loss=0.0085, over 3028864.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:49:58,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3332913.3333333335, ans=0.0 2023-11-26 09:50:00,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3332913.3333333335, ans=0.0 2023-11-26 09:50:09,639 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-26 09:50:15,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3332980.0, ans=0.1 2023-11-26 09:50:19,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3333046.6666666665, ans=0.125 2023-11-26 09:50:27,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3333046.6666666665, ans=0.125 2023-11-26 09:50:33,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333113.3333333335, ans=0.1 2023-11-26 09:50:35,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.609e+01 9.217e+01 1.000e+02 1.228e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 09:50:40,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3333180.0, ans=0.5 2023-11-26 09:50:41,408 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7000, loss[loss=0.07012, simple_loss=0.08999, pruned_loss=0.01649, audio_tagging_loss=0.008631, over 16550.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09031, pruned_loss=0.01224, audio_tagging_loss=0.00857, over 3032045.91 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:51:00,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3333246.6666666665, ans=0.1 2023-11-26 09:51:05,441 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-26 09:51:09,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3333313.3333333335, ans=0.125 2023-11-26 09:51:15,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3333313.3333333335, ans=0.2 2023-11-26 09:51:16,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3333380.0, ans=0.0 2023-11-26 09:51:17,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3333380.0, ans=10.0 2023-11-26 09:51:21,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-26 09:51:28,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3333446.6666666665, ans=0.125 2023-11-26 09:51:40,246 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7050, loss[loss=0.05436, simple_loss=0.06951, pruned_loss=0.0103, audio_tagging_loss=0.009306, over 15152.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09062, pruned_loss=0.01244, audio_tagging_loss=0.00866, over 3033760.82 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:51:59,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3333580.0, ans=0.125 2023-11-26 09:52:01,669 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:52:02,571 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-26 09:52:03,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-26 09:52:21,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3333713.3333333335, ans=0.0 2023-11-26 09:52:31,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.700e+01 9.418e+01 1.001e+02 1.210e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 09:52:35,497 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7100, loss[loss=0.06305, simple_loss=0.08428, pruned_loss=0.01148, audio_tagging_loss=0.009428, over 14900.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08959, pruned_loss=0.01234, audio_tagging_loss=0.008725, over 3035231.79 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:52:40,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-26 09:52:46,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3333913.3333333335, ans=0.0 2023-11-26 09:52:57,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3333980.0, ans=0.0 2023-11-26 09:52:58,309 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-26 09:53:08,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3334046.6666666665, ans=0.2 2023-11-26 09:53:30,081 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7150, loss[loss=0.0593, simple_loss=0.076, pruned_loss=0.01252, audio_tagging_loss=0.008786, over 14148.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08938, pruned_loss=0.01238, audio_tagging_loss=0.008809, over 3037205.54 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:53:42,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3334246.6666666665, ans=0.0 2023-11-26 09:53:54,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-26 09:54:04,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2023-11-26 09:54:08,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-26 09:54:09,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3334380.0, ans=0.125 2023-11-26 09:54:17,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3334446.6666666665, ans=0.0 2023-11-26 09:54:21,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.683e+01 9.180e+01 1.005e+02 1.351e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 09:54:23,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-26 09:54:24,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-26 09:54:25,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.92 vs. limit=10.0 2023-11-26 09:54:26,320 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7200, loss[loss=0.06407, simple_loss=0.08633, pruned_loss=0.01343, audio_tagging_loss=0.007472, over 16032.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08907, pruned_loss=0.01239, audio_tagging_loss=0.00898, over 3033405.94 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:54:27,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3334513.3333333335, ans=0.0 2023-11-26 09:54:46,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3334580.0, ans=0.1 2023-11-26 09:54:47,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3334646.6666666665, ans=0.125 2023-11-26 09:54:49,689 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-26 09:55:02,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-26 09:55:02,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-26 09:55:07,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-26 09:55:15,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3334780.0, ans=0.1 2023-11-26 09:55:22,860 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7250, loss[loss=0.07722, simple_loss=0.1012, pruned_loss=0.01531, audio_tagging_loss=0.0113, over 15883.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08931, pruned_loss=0.01226, audio_tagging_loss=0.009058, over 3038700.86 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:55:31,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2023-11-26 09:55:45,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-26 09:56:15,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.852e+01 9.361e+01 1.013e+02 1.372e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:56:18,398 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7300, loss[loss=0.07439, simple_loss=0.1048, pruned_loss=0.01421, audio_tagging_loss=0.007767, over 16612.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09008, pruned_loss=0.01233, audio_tagging_loss=0.008815, over 3037521.18 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:56:37,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3335246.6666666665, ans=0.05 2023-11-26 09:56:38,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3335246.6666666665, ans=0.125 2023-11-26 09:56:42,135 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-26 09:56:47,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3335313.3333333335, ans=0.125 2023-11-26 09:56:52,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2023-11-26 09:56:54,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335380.0, ans=0.1 2023-11-26 09:56:55,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3335380.0, ans=0.125 2023-11-26 09:56:55,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3335380.0, ans=0.125 2023-11-26 09:57:00,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3335380.0, ans=0.125 2023-11-26 09:57:08,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335446.6666666665, ans=0.1 2023-11-26 09:57:08,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-26 09:57:14,263 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7350, loss[loss=0.07332, simple_loss=0.1116, pruned_loss=0.01336, audio_tagging_loss=0.00419, over 15748.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09028, pruned_loss=0.01249, audio_tagging_loss=0.008736, over 3043684.37 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:57:20,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3335513.3333333335, ans=0.2 2023-11-26 09:57:37,771 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-26 09:57:44,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3335646.6666666665, ans=0.0 2023-11-26 09:57:52,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3335713.3333333335, ans=0.2 2023-11-26 09:57:55,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3335713.3333333335, ans=0.2 2023-11-26 09:57:57,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3335780.0, ans=0.2 2023-11-26 09:58:06,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.616e+01 9.240e+01 1.003e+02 1.313e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 09:58:09,962 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7400, loss[loss=0.07421, simple_loss=0.1003, pruned_loss=0.01467, audio_tagging_loss=0.009417, over 16379.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09041, pruned_loss=0.01244, audio_tagging_loss=0.008713, over 3050427.15 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:58:10,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3335846.6666666665, ans=0.125 2023-11-26 09:58:21,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3335913.3333333335, ans=0.05 2023-11-26 09:58:22,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-26 09:58:33,295 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-26 09:58:40,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=12.0 2023-11-26 09:59:05,896 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7450, loss[loss=0.04576, simple_loss=0.05922, pruned_loss=0.006774, audio_tagging_loss=0.009377, over 14893.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08999, pruned_loss=0.01228, audio_tagging_loss=0.008698, over 3049510.30 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:59:21,986 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:59:29,767 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-26 09:59:58,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.877e+01 9.404e+01 1.015e+02 1.370e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:00:01,516 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7500, loss[loss=0.06741, simple_loss=0.08925, pruned_loss=0.01294, audio_tagging_loss=0.009845, over 16224.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08965, pruned_loss=0.01227, audio_tagging_loss=0.008736, over 3049654.24 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:00:25,307 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-26 10:00:32,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3336646.6666666665, ans=0.125 2023-11-26 10:00:37,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3336713.3333333335, ans=0.0 2023-11-26 10:00:53,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2023-11-26 10:00:57,571 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7550, loss[loss=0.07062, simple_loss=0.09154, pruned_loss=0.01563, audio_tagging_loss=0.009215, over 15276.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08921, pruned_loss=0.0122, audio_tagging_loss=0.008687, over 3045162.58 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:00:57,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3336846.6666666665, ans=0.5 2023-11-26 10:00:59,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3336846.6666666665, ans=0.125 2023-11-26 10:01:21,050 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-26 10:01:36,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337046.6666666665, ans=0.1 2023-11-26 10:01:49,618 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.478e+01 8.999e+01 9.554e+01 1.187e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-26 10:01:53,388 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7600, loss[loss=0.09383, simple_loss=0.1272, pruned_loss=0.02204, audio_tagging_loss=0.008207, over 15792.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08915, pruned_loss=0.01222, audio_tagging_loss=0.008705, over 3035263.28 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:02:14,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3337313.3333333335, ans=10.0 2023-11-26 10:02:17,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-26 10:02:30,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3337380.0, ans=0.2 2023-11-26 10:02:32,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3337380.0, ans=0.1 2023-11-26 10:02:44,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337446.6666666665, ans=0.1 2023-11-26 10:02:47,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3337446.6666666665, ans=0.1 2023-11-26 10:02:48,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3337513.3333333335, ans=0.09899494936611666 2023-11-26 10:02:49,126 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7650, loss[loss=0.0616, simple_loss=0.09095, pruned_loss=0.009735, audio_tagging_loss=0.006387, over 14436.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08952, pruned_loss=0.01224, audio_tagging_loss=0.008619, over 3039358.48 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:03:05,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3337580.0, ans=0.125 2023-11-26 10:03:06,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3337580.0, ans=0.0 2023-11-26 10:03:11,922 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:03:12,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-26 10:03:41,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.670e+01 9.156e+01 1.010e+02 1.245e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 10:03:43,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3337780.0, ans=0.035 2023-11-26 10:03:45,692 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7700, loss[loss=0.06786, simple_loss=0.09837, pruned_loss=0.01113, audio_tagging_loss=0.007551, over 15285.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08941, pruned_loss=0.01219, audio_tagging_loss=0.008726, over 3046312.58 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:03:49,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-26 10:03:52,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-26 10:04:00,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3337913.3333333335, ans=0.125 2023-11-26 10:04:08,463 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-26 10:04:15,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3337980.0, ans=0.0 2023-11-26 10:04:33,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3338113.3333333335, ans=0.0 2023-11-26 10:04:35,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3338113.3333333335, ans=0.125 2023-11-26 10:04:40,558 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7750, loss[loss=0.07704, simple_loss=0.1018, pruned_loss=0.01593, audio_tagging_loss=0.01022, over 14655.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08869, pruned_loss=0.0121, audio_tagging_loss=0.008742, over 3044138.67 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:04:41,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-26 10:04:54,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-26 10:04:59,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3338246.6666666665, ans=0.1 2023-11-26 10:05:04,488 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-26 10:05:10,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=12.0 2023-11-26 10:05:11,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.60 vs. limit=10.0 2023-11-26 10:05:12,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3338313.3333333335, ans=0.0 2023-11-26 10:05:14,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3338380.0, ans=0.125 2023-11-26 10:05:32,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 9.045e+01 9.530e+01 1.049e+02 1.522e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:05:36,764 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7800, loss[loss=0.05853, simple_loss=0.07515, pruned_loss=0.009499, audio_tagging_loss=0.01146, over 14443.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08963, pruned_loss=0.01237, audio_tagging_loss=0.008716, over 3041150.98 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:05:39,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-11-26 10:05:42,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2023-11-26 10:05:43,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3338513.3333333335, ans=0.125 2023-11-26 10:05:48,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3338580.0, ans=0.05 2023-11-26 10:05:50,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-26 10:05:59,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2023-11-26 10:06:00,198 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-26 10:06:27,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3338780.0, ans=0.0 2023-11-26 10:06:32,785 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7850, loss[loss=0.05103, simple_loss=0.06058, pruned_loss=0.008689, audio_tagging_loss=0.01205, over 14971.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08934, pruned_loss=0.01232, audio_tagging_loss=0.008839, over 3039178.21 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:06:48,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3338913.3333333335, ans=0.1 2023-11-26 10:06:49,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3338913.3333333335, ans=0.125 2023-11-26 10:06:55,623 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-26 10:07:16,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3339113.3333333335, ans=0.125 2023-11-26 10:07:25,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.991e+01 9.385e+01 9.905e+01 1.223e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 10:07:28,430 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7900, loss[loss=0.0746, simple_loss=0.1124, pruned_loss=0.01212, audio_tagging_loss=0.006287, over 15434.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08993, pruned_loss=0.01237, audio_tagging_loss=0.008823, over 3046886.93 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:07:30,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3339180.0, ans=0.125 2023-11-26 10:07:41,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3339246.6666666665, ans=0.125 2023-11-26 10:07:48,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3339246.6666666665, ans=0.2 2023-11-26 10:07:52,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-26 10:07:52,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3339313.3333333335, ans=0.125 2023-11-26 10:08:11,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3339446.6666666665, ans=0.1 2023-11-26 10:08:23,020 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7950, loss[loss=0.08679, simple_loss=0.1323, pruned_loss=0.01355, audio_tagging_loss=0.007087, over 15594.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09053, pruned_loss=0.01239, audio_tagging_loss=0.008926, over 3053179.46 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:08:38,284 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:08:44,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3339580.0, ans=0.125 2023-11-26 10:08:47,346 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-26 10:08:48,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.98 vs. limit=22.5 2023-11-26 10:09:07,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3339780.0, ans=0.125 2023-11-26 10:09:10,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2023-11-26 10:09:15,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.923e+01 9.464e+01 1.020e+02 1.321e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 10:09:19,572 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8000, loss[loss=0.0552, simple_loss=0.07197, pruned_loss=0.008925, audio_tagging_loss=0.01029, over 14560.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08928, pruned_loss=0.0122, audio_tagging_loss=0.009112, over 3050530.77 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:09:21,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3339846.6666666665, ans=0.1 2023-11-26 10:09:34,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3339913.3333333335, ans=0.09899494936611666 2023-11-26 10:09:41,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3339980.0, ans=0.0 2023-11-26 10:09:42,391 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-26 10:09:52,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3340046.6666666665, ans=0.95 2023-11-26 10:10:00,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3340046.6666666665, ans=0.0 2023-11-26 10:10:10,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3340113.3333333335, ans=0.1 2023-11-26 10:10:15,404 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8050, loss[loss=0.05764, simple_loss=0.07354, pruned_loss=0.008975, audio_tagging_loss=0.01189, over 15168.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09004, pruned_loss=0.0124, audio_tagging_loss=0.009095, over 3046324.42 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:10:17,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3340180.0, ans=0.1 2023-11-26 10:10:28,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3340246.6666666665, ans=0.2 2023-11-26 10:10:37,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3340313.3333333335, ans=0.2 2023-11-26 10:10:38,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-26 10:10:48,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3340380.0, ans=0.1 2023-11-26 10:11:07,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.742e+01 9.437e+01 9.965e+01 1.262e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:11:10,524 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8100, loss[loss=0.06803, simple_loss=0.09049, pruned_loss=0.0153, audio_tagging_loss=0.007494, over 15471.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.0907, pruned_loss=0.01248, audio_tagging_loss=0.008939, over 3042745.92 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:11:17,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3340513.3333333335, ans=0.0 2023-11-26 10:11:30,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3340580.0, ans=0.0 2023-11-26 10:11:35,128 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-26 10:11:49,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-26 10:12:02,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3340780.0, ans=0.0 2023-11-26 10:12:02,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-11-26 10:12:05,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3340846.6666666665, ans=0.125 2023-11-26 10:12:07,297 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8150, loss[loss=0.07422, simple_loss=0.0956, pruned_loss=0.01873, audio_tagging_loss=0.007682, over 14778.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09127, pruned_loss=0.01265, audio_tagging_loss=0.008786, over 3043269.87 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:12:14,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2023-11-26 10:12:17,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3340913.3333333335, ans=0.125 2023-11-26 10:12:20,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3340913.3333333335, ans=0.125 2023-11-26 10:12:27,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2023-11-26 10:12:30,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-26 10:12:34,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3340980.0, ans=10.0 2023-11-26 10:12:38,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3340980.0, ans=0.125 2023-11-26 10:13:00,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.798e+01 9.302e+01 1.005e+02 1.230e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:13:00,657 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:13:02,554 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8200, loss[loss=0.06756, simple_loss=0.09526, pruned_loss=0.01274, audio_tagging_loss=0.007188, over 15597.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09116, pruned_loss=0.01244, audio_tagging_loss=0.008701, over 3045685.58 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:13:03,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2023-11-26 10:13:03,686 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:13:10,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3341180.0, ans=0.0 2023-11-26 10:13:12,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3341246.6666666665, ans=0.0 2023-11-26 10:13:17,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.89 vs. limit=10.0 2023-11-26 10:13:18,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3341246.6666666665, ans=0.0 2023-11-26 10:13:25,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-26 10:13:32,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-26 10:13:35,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3341380.0, ans=0.2 2023-11-26 10:13:38,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2023-11-26 10:13:38,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2023-11-26 10:13:42,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3341380.0, ans=0.0 2023-11-26 10:13:43,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3341380.0, ans=0.125 2023-11-26 10:13:51,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3341446.6666666665, ans=15.0 2023-11-26 10:13:53,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3341446.6666666665, ans=0.0 2023-11-26 10:13:57,319 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8250, loss[loss=0.06794, simple_loss=0.08683, pruned_loss=0.01577, audio_tagging_loss=0.008753, over 14926.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09023, pruned_loss=0.01237, audio_tagging_loss=0.008635, over 3044026.16 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:14:21,694 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-26 10:14:25,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3341646.6666666665, ans=0.125 2023-11-26 10:14:27,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3341646.6666666665, ans=0.0 2023-11-26 10:14:34,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341713.3333333335, ans=0.1 2023-11-26 10:14:41,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3341780.0, ans=0.125 2023-11-26 10:14:46,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3341780.0, ans=0.2 2023-11-26 10:14:50,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.840e+01 9.471e+01 1.008e+02 1.505e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 10:14:52,775 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8300, loss[loss=0.06529, simple_loss=0.0837, pruned_loss=0.0121, audio_tagging_loss=0.01135, over 14008.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.0903, pruned_loss=0.01237, audio_tagging_loss=0.008571, over 3046733.49 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:15:10,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341913.3333333335, ans=0.1 2023-11-26 10:15:16,632 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-26 10:15:43,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3342113.3333333335, ans=0.125 2023-11-26 10:15:45,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3342113.3333333335, ans=0.2 2023-11-26 10:15:48,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=22.5 2023-11-26 10:15:49,214 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8350, loss[loss=0.04856, simple_loss=0.06364, pruned_loss=0.006993, audio_tagging_loss=0.009744, over 15236.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09006, pruned_loss=0.01227, audio_tagging_loss=0.008532, over 3048196.71 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:16:11,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-26 10:16:16,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3342313.3333333335, ans=0.0 2023-11-26 10:16:27,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3342380.0, ans=0.1 2023-11-26 10:16:32,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-26 10:16:32,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3342446.6666666665, ans=0.125 2023-11-26 10:16:34,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3342446.6666666665, ans=0.125 2023-11-26 10:16:37,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-26 10:16:42,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.918e+01 9.503e+01 1.018e+02 1.589e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 10:16:44,282 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8400, loss[loss=0.08594, simple_loss=0.1116, pruned_loss=0.02033, audio_tagging_loss=0.009793, over 14633.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09044, pruned_loss=0.01241, audio_tagging_loss=0.008422, over 3042994.53 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:16:48,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2023-11-26 10:17:05,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3342646.6666666665, ans=0.1 2023-11-26 10:17:08,417 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-26 10:17:10,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3342646.6666666665, ans=0.07 2023-11-26 10:17:10,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-11-26 10:17:33,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-26 10:17:40,348 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8450, loss[loss=0.0479, simple_loss=0.0598, pruned_loss=0.007575, audio_tagging_loss=0.01042, over 14456.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.0897, pruned_loss=0.01238, audio_tagging_loss=0.008564, over 3047325.76 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:17:43,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3342846.6666666665, ans=0.0 2023-11-26 10:17:47,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3342846.6666666665, ans=0.0 2023-11-26 10:17:48,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3342846.6666666665, ans=0.0 2023-11-26 10:18:03,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-26 10:18:04,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-26 10:18:08,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3342980.0, ans=0.125 2023-11-26 10:18:24,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3343113.3333333335, ans=0.125 2023-11-26 10:18:33,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.791e+01 9.317e+01 9.949e+01 1.409e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 10:18:36,379 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8500, loss[loss=0.07668, simple_loss=0.1078, pruned_loss=0.01207, audio_tagging_loss=0.01072, over 15556.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0901, pruned_loss=0.01228, audio_tagging_loss=0.008621, over 3054514.88 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:18:46,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.73 vs. limit=10.0 2023-11-26 10:18:47,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3343246.6666666665, ans=0.125 2023-11-26 10:18:58,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=15.0 2023-11-26 10:18:59,109 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-26 10:19:00,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3343313.3333333335, ans=0.125 2023-11-26 10:19:02,459 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:19:29,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-26 10:19:30,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3343513.3333333335, ans=0.125 2023-11-26 10:19:31,400 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8550, loss[loss=0.08435, simple_loss=0.1098, pruned_loss=0.02129, audio_tagging_loss=0.008163, over 14948.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09116, pruned_loss=0.01247, audio_tagging_loss=0.008621, over 3054487.87 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:19:46,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3343580.0, ans=0.0 2023-11-26 10:19:48,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3343580.0, ans=0.0 2023-11-26 10:19:51,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3343580.0, ans=0.125 2023-11-26 10:19:54,778 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-26 10:20:00,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3343646.6666666665, ans=0.1 2023-11-26 10:20:02,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3343646.6666666665, ans=0.125 2023-11-26 10:20:13,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3343713.3333333335, ans=0.05 2023-11-26 10:20:16,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-26 10:20:24,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.767e+01 9.434e+01 1.006e+02 1.411e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:20:26,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3343846.6666666665, ans=0.125 2023-11-26 10:20:26,935 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8600, loss[loss=0.06044, simple_loss=0.08293, pruned_loss=0.009045, audio_tagging_loss=0.009929, over 15967.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.0914, pruned_loss=0.01248, audio_tagging_loss=0.00868, over 3054129.24 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:20:44,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3343913.3333333335, ans=0.125 2023-11-26 10:20:49,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3343980.0, ans=0.05 2023-11-26 10:20:50,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-26 10:20:53,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3343980.0, ans=0.125 2023-11-26 10:21:03,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2023-11-26 10:21:20,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3344113.3333333335, ans=0.0 2023-11-26 10:21:23,374 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8650, loss[loss=0.04788, simple_loss=0.05706, pruned_loss=0.009557, audio_tagging_loss=0.009797, over 15458.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.0904, pruned_loss=0.01226, audio_tagging_loss=0.008765, over 3058386.30 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:21:46,203 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-26 10:22:07,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.37 vs. limit=10.0 2023-11-26 10:22:09,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3344446.6666666665, ans=0.0 2023-11-26 10:22:18,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.843e+01 9.565e+01 1.046e+02 1.310e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 10:22:19,265 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8700, loss[loss=0.06336, simple_loss=0.08038, pruned_loss=0.01106, audio_tagging_loss=0.01211, over 15463.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09127, pruned_loss=0.01256, audio_tagging_loss=0.008805, over 3062156.19 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:22:29,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3344580.0, ans=0.125 2023-11-26 10:22:33,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3344580.0, ans=0.05 2023-11-26 10:22:42,578 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-26 10:22:46,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3344646.6666666665, ans=0.2 2023-11-26 10:22:50,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3344646.6666666665, ans=0.1 2023-11-26 10:22:51,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3344713.3333333335, ans=0.125 2023-11-26 10:23:02,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3344780.0, ans=0.125 2023-11-26 10:23:05,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3344780.0, ans=0.1 2023-11-26 10:23:06,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3344780.0, ans=0.125 2023-11-26 10:23:12,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3344780.0, ans=0.125 2023-11-26 10:23:14,992 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8750, loss[loss=0.06051, simple_loss=0.08188, pruned_loss=0.009872, audio_tagging_loss=0.009699, over 15224.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09103, pruned_loss=0.01253, audio_tagging_loss=0.008782, over 3053381.96 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:23:25,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3344913.3333333335, ans=0.2 2023-11-26 10:23:32,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3344913.3333333335, ans=0.2 2023-11-26 10:23:38,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-26 10:23:58,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3345113.3333333335, ans=0.0 2023-11-26 10:24:09,411 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 8.969e+01 9.428e+01 9.946e+01 1.483e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 10:24:10,491 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8800, loss[loss=0.06079, simple_loss=0.08159, pruned_loss=0.006663, audio_tagging_loss=0.01333, over 15337.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09149, pruned_loss=0.01287, audio_tagging_loss=0.009001, over 3059249.46 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:24:18,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2023-11-26 10:24:28,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2023-11-26 10:24:34,015 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-26 10:25:01,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3345446.6666666665, ans=0.1 2023-11-26 10:25:02,410 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:25:06,488 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8850, loss[loss=0.05038, simple_loss=0.06731, pruned_loss=0.01016, audio_tagging_loss=0.006568, over 15644.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09139, pruned_loss=0.01282, audio_tagging_loss=0.008981, over 3059254.27 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:25:12,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3345513.3333333335, ans=0.125 2023-11-26 10:25:18,739 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:25:22,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3345580.0, ans=0.1 2023-11-26 10:25:27,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3345646.6666666665, ans=0.125 2023-11-26 10:25:29,770 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-26 10:25:30,927 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:25:39,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3345713.3333333335, ans=0.2 2023-11-26 10:25:55,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3345780.0, ans=0.125 2023-11-26 10:26:01,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2023-11-26 10:26:01,687 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8900, loss[loss=0.07396, simple_loss=0.09962, pruned_loss=0.01588, audio_tagging_loss=0.008269, over 14907.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09219, pruned_loss=0.01289, audio_tagging_loss=0.008809, over 3055484.36 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:26:02,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.662e+01 9.297e+01 1.004e+02 1.286e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:26:13,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3345913.3333333335, ans=0.125 2023-11-26 10:26:25,478 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-26 10:26:33,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3345980.0, ans=0.1 2023-11-26 10:26:48,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3346113.3333333335, ans=0.0 2023-11-26 10:26:57,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3346180.0, ans=0.125 2023-11-26 10:26:57,806 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8950, loss[loss=0.0627, simple_loss=0.0843, pruned_loss=0.01126, audio_tagging_loss=0.009295, over 15301.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09256, pruned_loss=0.01291, audio_tagging_loss=0.008698, over 3058167.51 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:26:58,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3346180.0, ans=0.1 2023-11-26 10:27:04,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3346180.0, ans=0.125 2023-11-26 10:27:14,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3346246.6666666665, ans=0.0 2023-11-26 10:27:19,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3346313.3333333335, ans=0.1 2023-11-26 10:27:20,552 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-26 10:27:20,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3346313.3333333335, ans=0.0 2023-11-26 10:27:24,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3346313.3333333335, ans=0.0 2023-11-26 10:27:32,400 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:27:53,378 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9000, loss[loss=0.06093, simple_loss=0.08691, pruned_loss=0.009211, audio_tagging_loss=0.008261, over 16200.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09246, pruned_loss=0.01278, audio_tagging_loss=0.008634, over 3052249.73 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:27:53,379 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 10:28:08,276 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7348, 4.5899, 4.4024, 4.3994], device='cuda:1') 2023-11-26 10:28:26,042 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05901, simple_loss=0.0506, pruned_loss=0.005264, audio_tagging_loss=0.02845, over 4681554.00 frames. 2023-11-26 10:28:26,043 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 10:28:27,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.869e+01 9.478e+01 9.908e+01 1.192e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 10:28:39,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3346580.0, ans=0.1 2023-11-26 10:28:49,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-26 10:28:50,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3346646.6666666665, ans=0.0 2023-11-26 10:28:54,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-26 10:28:57,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3346713.3333333335, ans=0.0 2023-11-26 10:29:11,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3346780.0, ans=0.09899494936611666 2023-11-26 10:29:21,685 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9050, loss[loss=0.0647, simple_loss=0.0849, pruned_loss=0.01379, audio_tagging_loss=0.008466, over 14657.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09183, pruned_loss=0.01253, audio_tagging_loss=0.008613, over 3050528.62 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:29:21,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3346846.6666666665, ans=0.0 2023-11-26 10:29:44,454 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-26 10:29:48,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3346980.0, ans=0.125 2023-11-26 10:29:48,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2023-11-26 10:29:55,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3347046.6666666665, ans=6.0 2023-11-26 10:30:17,419 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9100, loss[loss=0.07464, simple_loss=0.0968, pruned_loss=0.01565, audio_tagging_loss=0.01059, over 15163.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09141, pruned_loss=0.01254, audio_tagging_loss=0.008607, over 3055345.43 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:30:17,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-26 10:30:18,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.587e+01 9.359e+01 1.002e+02 1.217e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:30:19,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3347180.0, ans=0.0 2023-11-26 10:30:41,780 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-26 10:30:50,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3347313.3333333335, ans=0.0 2023-11-26 10:30:54,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3347380.0, ans=0.0 2023-11-26 10:31:13,165 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9150, loss[loss=0.07346, simple_loss=0.09927, pruned_loss=0.01655, audio_tagging_loss=0.007275, over 14413.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09099, pruned_loss=0.01243, audio_tagging_loss=0.008568, over 3046583.26 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:31:16,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=8.0 2023-11-26 10:31:19,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3347513.3333333335, ans=0.125 2023-11-26 10:31:25,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-26 10:31:34,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-26 10:31:37,543 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-26 10:31:53,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3347713.3333333335, ans=0.125 2023-11-26 10:31:58,313 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:32:04,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2023-11-26 10:32:10,069 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9200, loss[loss=0.06609, simple_loss=0.09187, pruned_loss=0.01091, audio_tagging_loss=0.009246, over 14874.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09093, pruned_loss=0.01252, audio_tagging_loss=0.008574, over 3051662.02 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:32:10,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2023-11-26 10:32:11,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.621e+01 9.324e+01 1.045e+02 1.374e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 10:32:16,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-26 10:32:32,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3347980.0, ans=0.0 2023-11-26 10:32:33,169 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-26 10:32:39,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3347980.0, ans=0.125 2023-11-26 10:32:40,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3347980.0, ans=0.5 2023-11-26 10:32:47,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3348046.6666666665, ans=0.1 2023-11-26 10:32:54,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3348113.3333333335, ans=0.2 2023-11-26 10:33:04,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2023-11-26 10:33:06,574 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9250, loss[loss=0.05478, simple_loss=0.07692, pruned_loss=0.006558, audio_tagging_loss=0.009765, over 14664.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0906, pruned_loss=0.01255, audio_tagging_loss=0.008603, over 3053844.25 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:33:17,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:33:19,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3348246.6666666665, ans=0.125 2023-11-26 10:33:30,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-26 10:33:40,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3348380.0, ans=0.2 2023-11-26 10:34:02,025 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9300, loss[loss=0.06701, simple_loss=0.09324, pruned_loss=0.0123, audio_tagging_loss=0.008089, over 14058.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09048, pruned_loss=0.01244, audio_tagging_loss=0.008625, over 3056058.70 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:34:03,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.725e+01 9.420e+01 1.023e+02 1.550e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 10:34:04,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3348513.3333333335, ans=0.2 2023-11-26 10:34:05,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3348513.3333333335, ans=0.1 2023-11-26 10:34:07,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3348513.3333333335, ans=0.05 2023-11-26 10:34:26,038 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-26 10:34:57,937 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9350, loss[loss=0.06604, simple_loss=0.0927, pruned_loss=0.01284, audio_tagging_loss=0.006851, over 15124.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09078, pruned_loss=0.01243, audio_tagging_loss=0.008573, over 3055786.26 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:04,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3348846.6666666665, ans=0.125 2023-11-26 10:35:09,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3348913.3333333335, ans=0.0 2023-11-26 10:35:12,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3348913.3333333335, ans=0.1 2023-11-26 10:35:14,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3348913.3333333335, ans=0.2 2023-11-26 10:35:15,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3348913.3333333335, ans=0.125 2023-11-26 10:35:21,334 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-26 10:35:33,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3349046.6666666665, ans=0.125 2023-11-26 10:35:36,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3349046.6666666665, ans=0.125 2023-11-26 10:35:54,259 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9400, loss[loss=0.04979, simple_loss=0.0702, pruned_loss=0.007472, audio_tagging_loss=0.00722, over 15464.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09059, pruned_loss=0.01252, audio_tagging_loss=0.008634, over 3058558.67 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:55,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.779e+01 9.527e+01 1.025e+02 1.453e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 10:35:57,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-11-26 10:36:02,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3349180.0, ans=0.0 2023-11-26 10:36:06,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-26 10:36:11,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-26 10:36:13,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-26 10:36:17,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-26 10:36:21,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3349313.3333333335, ans=0.125 2023-11-26 10:36:27,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-26 10:36:30,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3349380.0, ans=0.04949747468305833 2023-11-26 10:36:35,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-26 10:36:42,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349446.6666666665, ans=0.1 2023-11-26 10:36:47,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3349446.6666666665, ans=0.0 2023-11-26 10:36:50,157 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9450, loss[loss=0.05732, simple_loss=0.07834, pruned_loss=0.008127, audio_tagging_loss=0.01003, over 15878.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09093, pruned_loss=0.01242, audio_tagging_loss=0.008688, over 3056437.47 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:36:50,191 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:36:52,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3349513.3333333335, ans=0.125 2023-11-26 10:36:55,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3349513.3333333335, ans=0.2 2023-11-26 10:36:57,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3349513.3333333335, ans=0.0 2023-11-26 10:36:58,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-26 10:37:02,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3349580.0, ans=0.125 2023-11-26 10:37:07,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2023-11-26 10:37:11,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-11-26 10:37:14,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-26 10:37:24,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3349713.3333333335, ans=0.125 2023-11-26 10:37:26,563 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:37:26,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-11-26 10:37:35,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3349780.0, ans=0.0 2023-11-26 10:37:40,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3349780.0, ans=0.1 2023-11-26 10:37:46,029 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9500, loss[loss=0.06835, simple_loss=0.08716, pruned_loss=0.01325, audio_tagging_loss=0.01151, over 14696.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09063, pruned_loss=0.01242, audio_tagging_loss=0.008813, over 3055216.91 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:37:47,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.787e+01 9.530e+01 1.013e+02 1.442e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:37:54,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3349846.6666666665, ans=10.0 2023-11-26 10:37:55,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3349846.6666666665, ans=0.125 2023-11-26 10:38:07,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3349980.0, ans=0.1 2023-11-26 10:38:09,487 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-26 10:38:13,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3349980.0, ans=0.0 2023-11-26 10:38:42,487 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9550, loss[loss=0.06622, simple_loss=0.09833, pruned_loss=0.008887, audio_tagging_loss=0.008167, over 15372.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09013, pruned_loss=0.0122, audio_tagging_loss=0.008943, over 3059323.79 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:38:50,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3350180.0, ans=0.125 2023-11-26 10:38:58,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3350246.6666666665, ans=0.125 2023-11-26 10:39:05,365 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-26 10:39:07,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3350313.3333333335, ans=0.025 2023-11-26 10:39:08,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3350313.3333333335, ans=0.1 2023-11-26 10:39:12,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3350313.3333333335, ans=0.125 2023-11-26 10:39:25,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3350380.0, ans=0.2 2023-11-26 10:39:26,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3350446.6666666665, ans=0.125 2023-11-26 10:39:29,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3350446.6666666665, ans=0.0 2023-11-26 10:39:37,572 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9600, loss[loss=0.04766, simple_loss=0.0625, pruned_loss=0.007811, audio_tagging_loss=0.008599, over 14721.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08955, pruned_loss=0.01215, audio_tagging_loss=0.009133, over 3061572.14 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:39:38,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.684e+01 9.310e+01 1.004e+02 1.298e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 10:39:56,915 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:39:59,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3350646.6666666665, ans=0.125 2023-11-26 10:40:01,557 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-26 10:40:04,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2023-11-26 10:40:08,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3350646.6666666665, ans=0.125 2023-11-26 10:40:09,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3350646.6666666665, ans=0.0 2023-11-26 10:40:16,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3350713.3333333335, ans=0.125 2023-11-26 10:40:18,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3350713.3333333335, ans=0.125 2023-11-26 10:40:20,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3350713.3333333335, ans=0.0 2023-11-26 10:40:20,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350713.3333333335, ans=0.1 2023-11-26 10:40:33,886 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9650, loss[loss=0.0698, simple_loss=0.09227, pruned_loss=0.01756, audio_tagging_loss=0.006102, over 13767.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08988, pruned_loss=0.01223, audio_tagging_loss=0.009011, over 3054002.12 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:40:43,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3350846.6666666665, ans=0.0 2023-11-26 10:40:57,894 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-26 10:40:59,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3350980.0, ans=0.125 2023-11-26 10:41:04,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3350980.0, ans=0.0 2023-11-26 10:41:15,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3351046.6666666665, ans=0.0 2023-11-26 10:41:17,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=10.0 2023-11-26 10:41:17,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3351113.3333333335, ans=0.125 2023-11-26 10:41:26,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-26 10:41:30,417 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9700, loss[loss=0.1, simple_loss=0.1399, pruned_loss=0.02357, audio_tagging_loss=0.006459, over 15647.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09119, pruned_loss=0.01247, audio_tagging_loss=0.008798, over 3047646.64 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:41:31,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.780e+01 9.294e+01 1.006e+02 1.332e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:41:32,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=8.0 2023-11-26 10:41:53,989 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-26 10:41:56,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2023-11-26 10:42:08,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-26 10:42:26,561 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9750, loss[loss=0.06767, simple_loss=0.09802, pruned_loss=0.01235, audio_tagging_loss=0.006309, over 14783.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09126, pruned_loss=0.01255, audio_tagging_loss=0.008663, over 3048258.95 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:42:29,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3351513.3333333335, ans=0.0 2023-11-26 10:42:49,943 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-26 10:42:53,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3351646.6666666665, ans=0.0 2023-11-26 10:42:56,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=12.0 2023-11-26 10:43:00,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3351713.3333333335, ans=0.1 2023-11-26 10:43:01,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3351713.3333333335, ans=0.125 2023-11-26 10:43:22,283 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9800, loss[loss=0.04692, simple_loss=0.06762, pruned_loss=0.007817, audio_tagging_loss=0.005292, over 13254.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09099, pruned_loss=0.01238, audio_tagging_loss=0.008569, over 3040799.88 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:43:23,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.016e+01 9.407e+01 1.014e+02 1.286e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:43:39,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3351913.3333333335, ans=0.125 2023-11-26 10:43:40,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3351913.3333333335, ans=0.1 2023-11-26 10:43:45,565 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-26 10:44:11,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2023-11-26 10:44:14,080 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:44:18,866 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9850, loss[loss=0.07099, simple_loss=0.09711, pruned_loss=0.01214, audio_tagging_loss=0.01029, over 15653.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09106, pruned_loss=0.01233, audio_tagging_loss=0.008538, over 3043027.80 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:44:41,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-26 10:44:49,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352313.3333333335, ans=0.1 2023-11-26 10:45:00,541 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:45:09,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3352446.6666666665, ans=15.0 2023-11-26 10:45:11,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352446.6666666665, ans=0.1 2023-11-26 10:45:14,246 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9900, loss[loss=0.0643, simple_loss=0.08754, pruned_loss=0.01299, audio_tagging_loss=0.007543, over 15139.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09158, pruned_loss=0.01241, audio_tagging_loss=0.00846, over 3044554.65 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:45:16,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.825e+01 9.361e+01 1.007e+02 1.352e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:45:18,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3352513.3333333335, ans=0.0 2023-11-26 10:45:30,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3352580.0, ans=0.2 2023-11-26 10:45:38,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-26 10:45:45,712 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:45:46,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-26 10:45:47,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3352713.3333333335, ans=0.125 2023-11-26 10:45:49,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-11-26 10:45:58,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3352780.0, ans=0.125 2023-11-26 10:46:11,025 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9950, loss[loss=0.06549, simple_loss=0.09405, pruned_loss=0.009771, audio_tagging_loss=0.008695, over 16190.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09141, pruned_loss=0.0125, audio_tagging_loss=0.00843, over 3051764.78 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:46:34,427 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-26 10:46:38,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3352980.0, ans=0.2 2023-11-26 10:46:39,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3352980.0, ans=0.0 2023-11-26 10:46:43,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3353046.6666666665, ans=0.0 2023-11-26 10:47:00,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3353113.3333333335, ans=0.125 2023-11-26 10:47:00,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-26 10:47:06,957 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10000, loss[loss=0.06741, simple_loss=0.09008, pruned_loss=0.01302, audio_tagging_loss=0.009349, over 14362.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09114, pruned_loss=0.01241, audio_tagging_loss=0.008454, over 3047564.88 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:47:09,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 8.822e+01 9.378e+01 1.009e+02 1.316e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:47:30,539 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-26 10:47:58,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3353446.6666666665, ans=0.0 2023-11-26 10:48:03,174 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10050, loss[loss=0.06558, simple_loss=0.08942, pruned_loss=0.01038, audio_tagging_loss=0.01049, over 14394.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.0905, pruned_loss=0.01232, audio_tagging_loss=0.008557, over 3040585.72 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:48:13,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3353580.0, ans=0.125 2023-11-26 10:48:27,011 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-26 10:48:39,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3353713.3333333335, ans=0.125 2023-11-26 10:48:53,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-26 10:48:59,504 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10100, loss[loss=0.06274, simple_loss=0.07828, pruned_loss=0.0141, audio_tagging_loss=0.009502, over 14119.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09056, pruned_loss=0.01234, audio_tagging_loss=0.008558, over 3044057.39 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:49:02,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.507e+01 9.128e+01 1.020e+02 1.362e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 10:49:17,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3353913.3333333335, ans=0.125 2023-11-26 10:49:22,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-26 10:49:24,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3353980.0, ans=10.0 2023-11-26 10:49:31,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3354046.6666666665, ans=0.125 2023-11-26 10:49:36,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3354046.6666666665, ans=0.0 2023-11-26 10:49:45,235 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:49:53,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3354113.3333333335, ans=0.125 2023-11-26 10:49:55,338 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10150, loss[loss=0.05472, simple_loss=0.06945, pruned_loss=0.008387, audio_tagging_loss=0.0116, over 14892.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09109, pruned_loss=0.0123, audio_tagging_loss=0.00862, over 3049698.03 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:15,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3354246.6666666665, ans=0.5 2023-11-26 10:50:15,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3354246.6666666665, ans=0.1 2023-11-26 10:50:18,275 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-26 10:50:22,570 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:50:33,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-26 10:50:50,990 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10200, loss[loss=0.05378, simple_loss=0.06138, pruned_loss=0.01027, audio_tagging_loss=0.01282, over 14829.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09066, pruned_loss=0.01225, audio_tagging_loss=0.008697, over 3054712.94 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:54,053 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.608e+01 9.223e+01 1.008e+02 1.287e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 10:50:54,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3354513.3333333335, ans=0.0 2023-11-26 10:51:00,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3354580.0, ans=0.035 2023-11-26 10:51:01,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2023-11-26 10:51:13,278 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:51:14,400 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-26 10:51:46,353 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10250, loss[loss=0.0585, simple_loss=0.08111, pruned_loss=0.009935, audio_tagging_loss=0.008011, over 15581.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08917, pruned_loss=0.01207, audio_tagging_loss=0.008819, over 3054635.71 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:52:10,956 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-26 10:52:15,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-26 10:52:16,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3354980.0, ans=0.2 2023-11-26 10:52:43,315 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10300, loss[loss=0.07908, simple_loss=0.1155, pruned_loss=0.01455, audio_tagging_loss=0.00679, over 15789.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08949, pruned_loss=0.01226, audio_tagging_loss=0.008786, over 3054783.23 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:52:46,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.732e+01 9.378e+01 1.015e+02 1.295e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:53:06,366 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-26 10:53:07,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=8.0 2023-11-26 10:53:39,419 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10350, loss[loss=0.06548, simple_loss=0.09028, pruned_loss=0.01064, audio_tagging_loss=0.009698, over 15863.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08953, pruned_loss=0.01236, audio_tagging_loss=0.008886, over 3057358.16 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:53:45,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3355513.3333333335, ans=0.125 2023-11-26 10:53:53,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3355580.0, ans=0.125 2023-11-26 10:54:02,162 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-26 10:54:03,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3355646.6666666665, ans=0.07 2023-11-26 10:54:04,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3355646.6666666665, ans=0.125 2023-11-26 10:54:33,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3355846.6666666665, ans=0.2 2023-11-26 10:54:34,668 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10400, loss[loss=0.07965, simple_loss=0.1129, pruned_loss=0.01462, audio_tagging_loss=0.008572, over 15230.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08927, pruned_loss=0.01231, audio_tagging_loss=0.009012, over 3060777.13 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:54:37,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3355846.6666666665, ans=0.1 2023-11-26 10:54:37,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.834e+01 9.411e+01 9.985e+01 1.468e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 10:54:38,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3355846.6666666665, ans=0.125 2023-11-26 10:54:47,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3355913.3333333335, ans=0.125 2023-11-26 10:54:54,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3355913.3333333335, ans=0.125 2023-11-26 10:54:54,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3355913.3333333335, ans=0.125 2023-11-26 10:54:58,712 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-26 10:54:59,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=12.0 2023-11-26 10:55:07,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3356046.6666666665, ans=0.125 2023-11-26 10:55:20,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3356113.3333333335, ans=0.0 2023-11-26 10:55:30,880 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10450, loss[loss=0.07976, simple_loss=0.1033, pruned_loss=0.01951, audio_tagging_loss=0.008589, over 15047.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0893, pruned_loss=0.01238, audio_tagging_loss=0.008967, over 3051432.43 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:55:53,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3356313.3333333335, ans=0.125 2023-11-26 10:55:54,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-26 10:56:04,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-11-26 10:56:26,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3356513.3333333335, ans=0.0 2023-11-26 10:56:27,223 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10500, loss[loss=0.07045, simple_loss=0.1023, pruned_loss=0.01184, audio_tagging_loss=0.007468, over 15796.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08966, pruned_loss=0.01242, audio_tagging_loss=0.008892, over 3056444.27 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:56:30,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.592e+01 9.300e+01 9.951e+01 1.449e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:56:50,271 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-26 10:57:04,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3356713.3333333335, ans=0.125 2023-11-26 10:57:22,551 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10550, loss[loss=0.06468, simple_loss=0.08795, pruned_loss=0.01199, audio_tagging_loss=0.008721, over 14476.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08969, pruned_loss=0.01228, audio_tagging_loss=0.008804, over 3050225.35 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:57:33,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3356913.3333333335, ans=0.0 2023-11-26 10:57:39,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3356913.3333333335, ans=0.125 2023-11-26 10:57:47,206 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-26 10:58:01,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3357046.6666666665, ans=0.125 2023-11-26 10:58:02,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3357046.6666666665, ans=0.125 2023-11-26 10:58:05,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-26 10:58:17,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3357180.0, ans=0.125 2023-11-26 10:58:18,535 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10600, loss[loss=0.07193, simple_loss=0.1015, pruned_loss=0.01575, audio_tagging_loss=0.00543, over 15691.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08992, pruned_loss=0.0124, audio_tagging_loss=0.008764, over 3051128.98 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:58:19,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357180.0, ans=0.1 2023-11-26 10:58:23,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.911e+01 9.725e+01 1.038e+02 1.409e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-26 10:58:25,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3357180.0, ans=0.125 2023-11-26 10:58:30,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3357246.6666666665, ans=0.0 2023-11-26 10:58:40,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3357313.3333333335, ans=0.2 2023-11-26 10:58:42,553 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-26 10:58:52,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3357380.0, ans=0.025 2023-11-26 10:58:58,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3357380.0, ans=0.025 2023-11-26 10:59:15,785 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10650, loss[loss=0.0587, simple_loss=0.08123, pruned_loss=0.008601, audio_tagging_loss=0.00948, over 14805.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09032, pruned_loss=0.01245, audio_tagging_loss=0.008756, over 3050025.20 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:59:15,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3357513.3333333335, ans=0.125 2023-11-26 10:59:19,242 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:59:23,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3357513.3333333335, ans=0.2 2023-11-26 10:59:24,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3357513.3333333335, ans=0.2 2023-11-26 10:59:35,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3357580.0, ans=0.125 2023-11-26 10:59:38,842 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-26 10:59:40,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-11-26 10:59:47,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3357713.3333333335, ans=0.0 2023-11-26 10:59:54,353 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:00:07,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357780.0, ans=0.1 2023-11-26 11:00:09,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3357846.6666666665, ans=0.0 2023-11-26 11:00:10,589 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10700, loss[loss=0.08296, simple_loss=0.1212, pruned_loss=0.01402, audio_tagging_loss=0.008321, over 15763.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09156, pruned_loss=0.01263, audio_tagging_loss=0.008642, over 3048189.65 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:00:14,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.978e+01 9.499e+01 1.034e+02 2.026e+02, threshold=1.900e+02, percent-clipped=1.0 2023-11-26 11:00:18,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357846.6666666665, ans=0.1 2023-11-26 11:00:27,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3357913.3333333335, ans=0.125 2023-11-26 11:00:32,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3357980.0, ans=0.2 2023-11-26 11:00:34,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-26 11:00:45,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3358046.6666666665, ans=0.0 2023-11-26 11:01:06,436 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10750, loss[loss=0.0689, simple_loss=0.09906, pruned_loss=0.01174, audio_tagging_loss=0.00764, over 14905.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09058, pruned_loss=0.01251, audio_tagging_loss=0.008547, over 3049494.15 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:01:12,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3358180.0, ans=0.0 2023-11-26 11:01:17,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3358246.6666666665, ans=0.125 2023-11-26 11:01:30,442 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-26 11:01:39,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3358380.0, ans=0.1 2023-11-26 11:01:43,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3358380.0, ans=0.5 2023-11-26 11:02:02,707 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10800, loss[loss=0.05145, simple_loss=0.06726, pruned_loss=0.0087, audio_tagging_loss=0.009117, over 14797.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0908, pruned_loss=0.01246, audio_tagging_loss=0.008474, over 3053559.01 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:02:07,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.688e+01 9.211e+01 9.904e+01 2.001e+02, threshold=1.842e+02, percent-clipped=1.0 2023-11-26 11:02:17,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3358580.0, ans=0.0 2023-11-26 11:02:25,502 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-26 11:02:27,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3358646.6666666665, ans=0.0 2023-11-26 11:02:36,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3358713.3333333335, ans=0.0 2023-11-26 11:02:36,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3358713.3333333335, ans=0.2 2023-11-26 11:02:58,739 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10850, loss[loss=0.07516, simple_loss=0.1046, pruned_loss=0.01465, audio_tagging_loss=0.008222, over 15725.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09018, pruned_loss=0.01241, audio_tagging_loss=0.00856, over 3052461.15 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:02,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3358846.6666666665, ans=0.07 2023-11-26 11:03:14,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3358913.3333333335, ans=0.0 2023-11-26 11:03:22,360 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-26 11:03:40,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3359046.6666666665, ans=0.1 2023-11-26 11:03:52,563 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:03:54,692 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10900, loss[loss=0.05946, simple_loss=0.08301, pruned_loss=0.009349, audio_tagging_loss=0.008606, over 15390.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09053, pruned_loss=0.0124, audio_tagging_loss=0.008622, over 3049948.52 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:58,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.941e+01 9.584e+01 1.034e+02 1.250e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 11:04:00,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2023-11-26 11:04:16,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3359313.3333333335, ans=0.125 2023-11-26 11:04:18,862 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-26 11:04:50,468 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10950, loss[loss=0.06348, simple_loss=0.09102, pruned_loss=0.01024, audio_tagging_loss=0.007733, over 15870.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09044, pruned_loss=0.01239, audio_tagging_loss=0.008646, over 3049924.72 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:04:59,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3359513.3333333335, ans=0.125 2023-11-26 11:05:07,623 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:05:10,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3359580.0, ans=22.5 2023-11-26 11:05:13,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-26 11:05:21,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3359646.6666666665, ans=0.125 2023-11-26 11:05:28,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2023-11-26 11:05:29,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3359713.3333333335, ans=0.0 2023-11-26 11:05:29,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3359713.3333333335, ans=0.125 2023-11-26 11:05:34,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3359780.0, ans=0.1 2023-11-26 11:05:37,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=12.0 2023-11-26 11:05:41,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3359780.0, ans=0.125 2023-11-26 11:05:46,842 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11000, loss[loss=0.0699, simple_loss=0.09121, pruned_loss=0.01391, audio_tagging_loss=0.01038, over 15763.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09075, pruned_loss=0.01259, audio_tagging_loss=0.008668, over 3046602.16 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:05:52,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.584e+01 9.485e+01 1.002e+02 1.136e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 11:05:56,474 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:05:58,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-26 11:06:04,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-26 11:06:04,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3359913.3333333335, ans=0.2 2023-11-26 11:06:10,136 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-26 11:06:11,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3359980.0, ans=0.125 2023-11-26 11:06:24,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3360046.6666666665, ans=0.0 2023-11-26 11:06:29,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3360046.6666666665, ans=0.0 2023-11-26 11:06:32,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3360113.3333333335, ans=0.0 2023-11-26 11:06:38,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3360113.3333333335, ans=0.125 2023-11-26 11:06:44,385 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11050, loss[loss=0.05599, simple_loss=0.06968, pruned_loss=0.009938, audio_tagging_loss=0.01121, over 15523.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09069, pruned_loss=0.01251, audio_tagging_loss=0.008729, over 3043698.42 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:07:08,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-26 11:07:15,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.62 vs. limit=22.5 2023-11-26 11:07:17,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3360380.0, ans=0.1 2023-11-26 11:07:22,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3360380.0, ans=0.125 2023-11-26 11:07:23,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3360380.0, ans=0.1 2023-11-26 11:07:32,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3360446.6666666665, ans=0.125 2023-11-26 11:07:40,686 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11100, loss[loss=0.03428, simple_loss=0.03244, pruned_loss=0.007364, audio_tagging_loss=0.0107, over 15034.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08977, pruned_loss=0.0125, audio_tagging_loss=0.008829, over 3043955.21 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:07:42,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3360513.3333333335, ans=0.1 2023-11-26 11:07:46,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.699e+01 9.322e+01 9.971e+01 1.375e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:08:04,223 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-26 11:08:36,571 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11150, loss[loss=0.07755, simple_loss=0.1045, pruned_loss=0.01476, audio_tagging_loss=0.01055, over 14817.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.0894, pruned_loss=0.01239, audio_tagging_loss=0.008997, over 3039790.92 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:08:45,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3360846.6666666665, ans=0.125 2023-11-26 11:09:00,117 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-26 11:09:03,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3360980.0, ans=0.0 2023-11-26 11:09:08,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3360980.0, ans=0.0 2023-11-26 11:09:09,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3361046.6666666665, ans=0.0 2023-11-26 11:09:23,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3361113.3333333335, ans=0.0 2023-11-26 11:09:24,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3361113.3333333335, ans=0.125 2023-11-26 11:09:32,594 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11200, loss[loss=0.07308, simple_loss=0.08703, pruned_loss=0.01784, audio_tagging_loss=0.01173, over 14242.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08954, pruned_loss=0.01248, audio_tagging_loss=0.009143, over 3044125.35 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:09:39,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.765e+01 9.322e+01 9.953e+01 1.270e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:09:48,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-11-26 11:09:55,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.72 vs. limit=6.0 2023-11-26 11:09:56,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-26 11:10:08,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3361380.0, ans=0.2 2023-11-26 11:10:09,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3361380.0, ans=0.09899494936611666 2023-11-26 11:10:20,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361446.6666666665, ans=0.1 2023-11-26 11:10:22,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361446.6666666665, ans=0.1 2023-11-26 11:10:26,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3361446.6666666665, ans=0.125 2023-11-26 11:10:28,730 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11250, loss[loss=0.07117, simple_loss=0.09887, pruned_loss=0.01342, audio_tagging_loss=0.008316, over 15082.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08947, pruned_loss=0.01249, audio_tagging_loss=0.009083, over 3052189.54 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:10:39,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2023-11-26 11:10:41,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3361580.0, ans=0.2 2023-11-26 11:10:51,751 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-26 11:11:19,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-26 11:11:20,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3361780.0, ans=0.0 2023-11-26 11:11:24,500 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11300, loss[loss=0.07196, simple_loss=0.09691, pruned_loss=0.01564, audio_tagging_loss=0.007867, over 14511.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08953, pruned_loss=0.01243, audio_tagging_loss=0.008992, over 3054261.67 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:11:24,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3361846.6666666665, ans=0.0 2023-11-26 11:11:30,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3361846.6666666665, ans=0.0 2023-11-26 11:11:30,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.741e+01 9.355e+01 1.007e+02 1.157e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:11:45,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3361913.3333333335, ans=0.125 2023-11-26 11:11:46,180 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:11:48,063 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-26 11:11:48,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3361980.0, ans=0.125 2023-11-26 11:12:01,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-11-26 11:12:09,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3362113.3333333335, ans=0.125 2023-11-26 11:12:19,982 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11350, loss[loss=0.06795, simple_loss=0.09484, pruned_loss=0.01269, audio_tagging_loss=0.007835, over 14651.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08952, pruned_loss=0.01247, audio_tagging_loss=0.008916, over 3053551.10 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:12:20,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3362180.0, ans=0.0 2023-11-26 11:12:26,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3362180.0, ans=0.125 2023-11-26 11:12:27,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2023-11-26 11:12:38,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3362246.6666666665, ans=0.0 2023-11-26 11:12:44,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-26 11:13:05,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3362446.6666666665, ans=0.125 2023-11-26 11:13:11,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3362446.6666666665, ans=0.0 2023-11-26 11:13:15,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3362513.3333333335, ans=0.125 2023-11-26 11:13:15,814 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11400, loss[loss=0.07205, simple_loss=0.1084, pruned_loss=0.01219, audio_tagging_loss=0.00566, over 15025.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09006, pruned_loss=0.01262, audio_tagging_loss=0.008731, over 3048289.94 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:13:23,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.698e+01 9.567e+01 1.043e+02 1.467e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 11:13:28,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3362580.0, ans=0.125 2023-11-26 11:13:39,000 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-26 11:13:52,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-11-26 11:14:12,483 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11450, loss[loss=0.07957, simple_loss=0.1071, pruned_loss=0.01784, audio_tagging_loss=0.008204, over 15627.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08965, pruned_loss=0.01244, audio_tagging_loss=0.008691, over 3045607.62 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:14:26,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3362913.3333333335, ans=0.125 2023-11-26 11:14:35,922 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-26 11:14:52,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3363046.6666666665, ans=0.2 2023-11-26 11:14:55,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3363046.6666666665, ans=0.125 2023-11-26 11:14:58,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3363113.3333333335, ans=0.025 2023-11-26 11:15:07,654 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11500, loss[loss=0.0681, simple_loss=0.09264, pruned_loss=0.01419, audio_tagging_loss=0.007592, over 16716.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08962, pruned_loss=0.01227, audio_tagging_loss=0.00876, over 3050164.66 frames. ], batch size: 64, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:15:12,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-26 11:15:15,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.706e+01 9.345e+01 1.007e+02 1.360e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 11:15:15,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3363180.0, ans=0.125 2023-11-26 11:15:22,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-26 11:15:31,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-26 11:16:03,941 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11550, loss[loss=0.06534, simple_loss=0.09266, pruned_loss=0.009828, audio_tagging_loss=0.009181, over 15237.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08981, pruned_loss=0.01222, audio_tagging_loss=0.008708, over 3054681.00 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:16:12,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-11-26 11:16:27,545 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-26 11:16:35,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3363646.6666666665, ans=0.0 2023-11-26 11:16:38,069 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:16:39,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3363713.3333333335, ans=0.0 2023-11-26 11:16:55,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3363780.0, ans=0.2 2023-11-26 11:17:00,352 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11600, loss[loss=0.05715, simple_loss=0.07969, pruned_loss=0.00639, audio_tagging_loss=0.01091, over 15097.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09003, pruned_loss=0.01236, audio_tagging_loss=0.008743, over 3049663.23 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:17:03,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=22.5 2023-11-26 11:17:08,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.748e+01 9.551e+01 1.048e+02 1.358e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 11:17:12,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3363913.3333333335, ans=0.1 2023-11-26 11:17:22,565 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-26 11:17:24,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3363980.0, ans=0.125 2023-11-26 11:17:31,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-26 11:17:32,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3364046.6666666665, ans=0.0 2023-11-26 11:17:55,415 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11650, loss[loss=0.07001, simple_loss=0.09937, pruned_loss=0.01281, audio_tagging_loss=0.007516, over 14705.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09012, pruned_loss=0.01231, audio_tagging_loss=0.008649, over 3049271.43 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:17:58,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3364180.0, ans=0.125 2023-11-26 11:18:01,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3364180.0, ans=0.1 2023-11-26 11:18:01,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3364180.0, ans=0.125 2023-11-26 11:18:04,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3364180.0, ans=0.0 2023-11-26 11:18:09,061 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:18:14,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3364246.6666666665, ans=0.0 2023-11-26 11:18:19,677 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-26 11:18:22,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3364313.3333333335, ans=0.1 2023-11-26 11:18:25,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2023-11-26 11:18:26,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3364313.3333333335, ans=0.2 2023-11-26 11:18:32,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2023-11-26 11:18:35,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3364380.0, ans=0.1 2023-11-26 11:18:51,431 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11700, loss[loss=0.07839, simple_loss=0.1087, pruned_loss=0.01619, audio_tagging_loss=0.007843, over 15079.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09008, pruned_loss=0.01237, audio_tagging_loss=0.008659, over 3055405.30 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:18:57,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3364513.3333333335, ans=0.2 2023-11-26 11:19:00,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.724e+01 9.353e+01 9.996e+01 1.834e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:19:13,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3364646.6666666665, ans=0.125 2023-11-26 11:19:15,246 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-26 11:19:31,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3364713.3333333335, ans=0.125 2023-11-26 11:19:44,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3364780.0, ans=0.125 2023-11-26 11:19:47,702 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11750, loss[loss=0.06173, simple_loss=0.08228, pruned_loss=0.01223, audio_tagging_loss=0.00836, over 14747.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09085, pruned_loss=0.01261, audio_tagging_loss=0.008606, over 3053540.52 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:20:02,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-26 11:20:10,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-26 11:20:23,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3365046.6666666665, ans=0.125 2023-11-26 11:20:36,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3365113.3333333335, ans=0.0 2023-11-26 11:20:36,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-26 11:20:41,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3365113.3333333335, ans=0.125 2023-11-26 11:20:43,318 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11800, loss[loss=0.07348, simple_loss=0.103, pruned_loss=0.01576, audio_tagging_loss=0.006211, over 15439.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09066, pruned_loss=0.01259, audio_tagging_loss=0.008657, over 3046027.36 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:20:43,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365180.0, ans=0.1 2023-11-26 11:20:51,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.955e+01 9.712e+01 1.042e+02 1.352e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 11:21:04,706 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:21:06,722 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-26 11:21:14,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3365313.3333333335, ans=0.07 2023-11-26 11:21:26,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3365380.0, ans=0.125 2023-11-26 11:21:30,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365446.6666666665, ans=0.1 2023-11-26 11:21:39,229 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11850, loss[loss=0.06287, simple_loss=0.09087, pruned_loss=0.009026, audio_tagging_loss=0.008411, over 15223.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09024, pruned_loss=0.01253, audio_tagging_loss=0.008844, over 3043973.96 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:21:40,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-26 11:21:42,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3365513.3333333335, ans=0.0 2023-11-26 11:21:45,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3365513.3333333335, ans=0.1 2023-11-26 11:22:03,166 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-26 11:22:15,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3365713.3333333335, ans=0.1 2023-11-26 11:22:22,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3365780.0, ans=0.2 2023-11-26 11:22:29,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3365780.0, ans=0.125 2023-11-26 11:22:34,575 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11900, loss[loss=0.0644, simple_loss=0.09302, pruned_loss=0.00838, audio_tagging_loss=0.009515, over 14643.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.0904, pruned_loss=0.01247, audio_tagging_loss=0.008938, over 3041994.96 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:22:43,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-26 11:22:43,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-11-26 11:22:44,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.882e+01 9.383e+01 1.003e+02 1.296e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 11:22:53,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3365913.3333333335, ans=0.125 2023-11-26 11:22:58,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-26 11:23:05,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3365980.0, ans=0.125 2023-11-26 11:23:06,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3365980.0, ans=0.125 2023-11-26 11:23:12,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3366046.6666666665, ans=0.0 2023-11-26 11:23:30,284 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11950, loss[loss=0.06013, simple_loss=0.07835, pruned_loss=0.01184, audio_tagging_loss=0.009122, over 14292.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08992, pruned_loss=0.01251, audio_tagging_loss=0.009015, over 3042293.91 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:23:31,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3366180.0, ans=22.5 2023-11-26 11:23:47,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3366246.6666666665, ans=0.0 2023-11-26 11:23:53,597 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-26 11:23:58,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-26 11:24:05,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-26 11:24:19,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3366446.6666666665, ans=0.0 2023-11-26 11:24:22,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3366446.6666666665, ans=0.1 2023-11-26 11:24:23,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2023-11-26 11:24:24,558 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 12000, loss[loss=0.08462, simple_loss=0.1162, pruned_loss=0.0157, audio_tagging_loss=0.01083, over 15047.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09028, pruned_loss=0.01249, audio_tagging_loss=0.009061, over 3051187.22 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:24:24,559 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 11:24:39,738 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3118, 3.1234, 2.6146, 2.8072], device='cuda:1') 2023-11-26 11:24:57,246 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005274, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 11:24:57,247 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 11:25:05,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.933e+01 9.493e+01 1.025e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 11:25:06,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3366580.0, ans=0.125 2023-11-26 11:25:19,322 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-26 11:25:20,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3366646.6666666665, ans=0.125 2023-11-26 11:25:50,628 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 0, loss[loss=0.07526, simple_loss=0.08115, pruned_loss=0.01362, audio_tagging_loss=0.02106, over 15048.00 frames. ], tot_loss[loss=0.07526, simple_loss=0.08115, pruned_loss=0.01362, audio_tagging_loss=0.02106, over 15048.00 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:25:50,629 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 11:26:02,834 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7786, 5.4790, 5.1861, 5.2538], device='cuda:1') 2023-11-26 11:26:21,921 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05779, simple_loss=0.05063, pruned_loss=0.005275, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 11:26:21,922 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 11:26:48,176 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:26:51,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3366806.6666666665, ans=0.0 2023-11-26 11:27:01,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-26 11:27:14,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-26 11:27:17,149 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 50, loss[loss=0.06356, simple_loss=0.06573, pruned_loss=0.01056, audio_tagging_loss=0.02013, over 15047.00 frames. ], tot_loss[loss=0.0741, simple_loss=0.08918, pruned_loss=0.01243, audio_tagging_loss=0.01708, over 685217.20 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:27:34,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3367073.3333333335, ans=22.5 2023-11-26 11:27:37,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3367073.3333333335, ans=0.09899494936611666 2023-11-26 11:27:51,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367206.6666666665, ans=0.1 2023-11-26 11:27:56,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.204e+01 9.484e+01 1.020e+02 1.096e+02 2.411e+02, threshold=2.041e+02, percent-clipped=1.0 2023-11-26 11:28:09,904 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-26 11:28:13,116 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 100, loss[loss=0.09139, simple_loss=0.1213, pruned_loss=0.02032, audio_tagging_loss=0.01043, over 14797.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.09086, pruned_loss=0.01251, audio_tagging_loss=0.01621, over 1209685.11 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:28:21,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3367340.0, ans=0.125 2023-11-26 11:28:31,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3367406.6666666665, ans=0.2 2023-11-26 11:28:47,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3367540.0, ans=0.0 2023-11-26 11:28:52,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3367540.0, ans=0.1 2023-11-26 11:28:58,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3367606.6666666665, ans=0.125 2023-11-26 11:29:00,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3367606.6666666665, ans=0.0 2023-11-26 11:29:02,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2023-11-26 11:29:06,283 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-26 11:29:09,530 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 150, loss[loss=0.04394, simple_loss=0.05003, pruned_loss=0.004426, audio_tagging_loss=0.0145, over 16880.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.08963, pruned_loss=0.0122, audio_tagging_loss=0.01458, over 1624031.56 frames. ], batch size: 66, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:29:23,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2023-11-26 11:29:49,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 9.187e+01 9.762e+01 1.032e+02 1.254e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 11:29:55,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3367940.0, ans=0.125 2023-11-26 11:29:57,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367940.0, ans=0.1 2023-11-26 11:30:01,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3367940.0, ans=0.0 2023-11-26 11:30:02,315 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-26 11:30:05,738 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 200, loss[loss=0.0535, simple_loss=0.06506, pruned_loss=0.00917, audio_tagging_loss=0.0118, over 15807.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.08976, pruned_loss=0.01233, audio_tagging_loss=0.01284, over 1942335.46 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:30:16,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.54 vs. limit=10.0 2023-11-26 11:30:23,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3368073.3333333335, ans=0.125 2023-11-26 11:30:25,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.21 vs. limit=22.5 2023-11-26 11:30:25,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3368073.3333333335, ans=0.0 2023-11-26 11:30:44,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3368206.6666666665, ans=0.125 2023-11-26 11:30:58,367 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-26 11:31:02,031 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 250, loss[loss=0.03699, simple_loss=0.04421, pruned_loss=0.00396, audio_tagging_loss=0.01093, over 14912.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.08985, pruned_loss=0.01232, audio_tagging_loss=0.01161, over 2183510.21 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:31:03,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3368340.0, ans=0.035 2023-11-26 11:31:26,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3368473.3333333335, ans=0.1 2023-11-26 11:31:28,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-26 11:31:36,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3368540.0, ans=0.1 2023-11-26 11:31:38,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3368540.0, ans=0.125 2023-11-26 11:31:41,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3368540.0, ans=0.1 2023-11-26 11:31:43,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.006e+01 9.717e+01 1.058e+02 1.490e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 11:31:54,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-26 11:31:58,287 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 300, loss[loss=0.05587, simple_loss=0.07311, pruned_loss=0.00772, audio_tagging_loss=0.0116, over 14439.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09063, pruned_loss=0.01244, audio_tagging_loss=0.01072, over 2385629.52 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:32:05,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3368673.3333333335, ans=0.125 2023-11-26 11:32:26,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2023-11-26 11:32:27,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3368806.6666666665, ans=0.125 2023-11-26 11:32:32,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3368873.3333333335, ans=0.0 2023-11-26 11:32:36,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3368873.3333333335, ans=0.1 2023-11-26 11:32:50,419 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-26 11:32:54,048 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 350, loss[loss=0.07166, simple_loss=0.1005, pruned_loss=0.01588, audio_tagging_loss=0.005517, over 15034.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09126, pruned_loss=0.01252, audio_tagging_loss=0.01012, over 2533684.92 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:33:18,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3369140.0, ans=0.2 2023-11-26 11:33:29,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3369206.6666666665, ans=0.1 2023-11-26 11:33:35,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.731e+01 9.201e+01 9.958e+01 1.413e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:33:46,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-26 11:33:50,621 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 400, loss[loss=0.055, simple_loss=0.06806, pruned_loss=0.01105, audio_tagging_loss=0.009923, over 15896.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09054, pruned_loss=0.01249, audio_tagging_loss=0.009917, over 2646377.37 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:34:08,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-26 11:34:12,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-11-26 11:34:21,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3369473.3333333335, ans=0.0 2023-11-26 11:34:34,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3369606.6666666665, ans=0.0 2023-11-26 11:34:41,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3369606.6666666665, ans=0.125 2023-11-26 11:34:43,260 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-26 11:34:46,991 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 450, loss[loss=0.04514, simple_loss=0.05522, pruned_loss=0.008673, audio_tagging_loss=0.008854, over 14788.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09121, pruned_loss=0.01266, audio_tagging_loss=0.00956, over 2738342.25 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:34:58,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.32 vs. limit=5.0 2023-11-26 11:35:16,725 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:35:27,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3369873.3333333335, ans=0.0 2023-11-26 11:35:27,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.871e+01 9.451e+01 1.026e+02 1.404e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 11:35:39,141 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-26 11:35:39,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3369940.0, ans=0.2 2023-11-26 11:35:42,266 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 500, loss[loss=0.08485, simple_loss=0.1177, pruned_loss=0.0176, audio_tagging_loss=0.008384, over 15214.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09117, pruned_loss=0.01266, audio_tagging_loss=0.009288, over 2808094.48 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:35:47,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3370006.6666666665, ans=0.0 2023-11-26 11:35:59,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3370073.3333333335, ans=0.125 2023-11-26 11:36:07,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3370140.0, ans=0.0 2023-11-26 11:36:12,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370140.0, ans=0.1 2023-11-26 11:36:35,160 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-26 11:36:38,289 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 550, loss[loss=0.06878, simple_loss=0.09756, pruned_loss=0.01285, audio_tagging_loss=0.007154, over 15317.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09073, pruned_loss=0.01241, audio_tagging_loss=0.009275, over 2860091.08 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:36:49,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370406.6666666665, ans=0.1 2023-11-26 11:36:51,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3370406.6666666665, ans=0.125 2023-11-26 11:36:55,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3370406.6666666665, ans=0.125 2023-11-26 11:37:10,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3370540.0, ans=0.0 2023-11-26 11:37:19,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.880e+01 9.459e+01 1.018e+02 1.226e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 11:37:24,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3370606.6666666665, ans=0.2 2023-11-26 11:37:26,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3370606.6666666665, ans=6.0 2023-11-26 11:37:30,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-26 11:37:31,038 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-26 11:37:34,742 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 600, loss[loss=0.06375, simple_loss=0.07997, pruned_loss=0.01428, audio_tagging_loss=0.009492, over 14912.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09007, pruned_loss=0.01228, audio_tagging_loss=0.00919, over 2899858.16 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:37:39,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3370673.3333333335, ans=0.125 2023-11-26 11:37:48,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3370740.0, ans=0.125 2023-11-26 11:37:56,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370806.6666666665, ans=0.1 2023-11-26 11:38:09,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3370873.3333333335, ans=0.125 2023-11-26 11:38:13,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3370873.3333333335, ans=0.1 2023-11-26 11:38:25,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3370940.0, ans=0.125 2023-11-26 11:38:27,374 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-26 11:38:30,487 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 650, loss[loss=0.05995, simple_loss=0.06892, pruned_loss=0.01439, audio_tagging_loss=0.0111, over 14295.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09108, pruned_loss=0.01257, audio_tagging_loss=0.009028, over 2925580.06 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:39:12,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.620e+01 9.335e+01 1.001e+02 1.278e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 11:39:22,910 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-26 11:39:26,076 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 700, loss[loss=0.05877, simple_loss=0.08253, pruned_loss=0.01064, audio_tagging_loss=0.006874, over 15536.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09115, pruned_loss=0.01249, audio_tagging_loss=0.008999, over 2956866.64 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:39:27,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3371340.0, ans=0.125 2023-11-26 11:39:33,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3371340.0, ans=0.2 2023-11-26 11:39:37,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3371406.6666666665, ans=0.125 2023-11-26 11:39:41,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3371406.6666666665, ans=0.0 2023-11-26 11:40:19,250 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-26 11:40:22,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-26 11:40:22,312 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 750, loss[loss=0.07669, simple_loss=0.1059, pruned_loss=0.016, audio_tagging_loss=0.007727, over 15482.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09078, pruned_loss=0.01233, audio_tagging_loss=0.009005, over 2983641.62 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:40:32,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-26 11:40:39,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3371740.0, ans=0.125 2023-11-26 11:40:49,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-26 11:40:53,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-26 11:40:58,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3371873.3333333335, ans=0.0 2023-11-26 11:41:03,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.569e+01 9.106e+01 9.803e+01 1.361e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 11:41:04,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3371873.3333333335, ans=0.125 2023-11-26 11:41:14,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3371940.0, ans=0.2 2023-11-26 11:41:15,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-26 11:41:19,108 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 800, loss[loss=0.06732, simple_loss=0.09672, pruned_loss=0.0121, audio_tagging_loss=0.006869, over 14301.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09055, pruned_loss=0.01219, audio_tagging_loss=0.008962, over 2999156.67 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:41:22,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-26 11:41:28,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-26 11:41:32,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3372073.3333333335, ans=0.0 2023-11-26 11:41:34,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-26 11:41:35,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-26 11:41:54,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3372206.6666666665, ans=0.125 2023-11-26 11:42:11,427 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-26 11:42:12,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2023-11-26 11:42:12,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3372273.3333333335, ans=0.2 2023-11-26 11:42:14,555 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 850, loss[loss=0.06619, simple_loss=0.09053, pruned_loss=0.01277, audio_tagging_loss=0.008157, over 15031.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09076, pruned_loss=0.01224, audio_tagging_loss=0.00896, over 3013580.70 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:42:28,627 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:42:40,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-26 11:42:43,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3372473.3333333335, ans=0.125 2023-11-26 11:42:45,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3372473.3333333335, ans=0.0 2023-11-26 11:42:52,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3372540.0, ans=0.125 2023-11-26 11:42:55,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.767e+01 9.422e+01 1.006e+02 1.207e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 11:42:58,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3372606.6666666665, ans=0.0 2023-11-26 11:43:07,285 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-26 11:43:11,000 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 900, loss[loss=0.06197, simple_loss=0.08066, pruned_loss=0.01054, audio_tagging_loss=0.0111, over 15795.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09065, pruned_loss=0.01235, audio_tagging_loss=0.008983, over 3018662.38 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:43:13,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3372673.3333333335, ans=0.07 2023-11-26 11:43:15,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3372673.3333333335, ans=0.125 2023-11-26 11:43:16,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3372673.3333333335, ans=0.0 2023-11-26 11:43:17,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-26 11:44:04,599 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-26 11:44:07,750 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 950, loss[loss=0.06266, simple_loss=0.08173, pruned_loss=0.01049, audio_tagging_loss=0.0113, over 16001.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09026, pruned_loss=0.01219, audio_tagging_loss=0.008904, over 3023916.31 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:44:15,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2023-11-26 11:44:17,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3373073.3333333335, ans=0.125 2023-11-26 11:44:22,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3373073.3333333335, ans=0.2 2023-11-26 11:44:28,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3373140.0, ans=0.0 2023-11-26 11:44:33,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3373140.0, ans=0.125 2023-11-26 11:44:42,386 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:44:48,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.709e+01 9.307e+01 9.774e+01 1.284e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 11:44:50,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3373206.6666666665, ans=0.125 2023-11-26 11:44:59,661 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-26 11:45:03,184 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1000, loss[loss=0.05815, simple_loss=0.07594, pruned_loss=0.01196, audio_tagging_loss=0.008216, over 15558.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09009, pruned_loss=0.0122, audio_tagging_loss=0.008781, over 3023983.75 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:45:05,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-26 11:45:27,322 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:45:41,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3373540.0, ans=0.125 2023-11-26 11:45:47,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3373606.6666666665, ans=0.125 2023-11-26 11:45:55,483 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-26 11:45:58,638 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1050, loss[loss=0.0729, simple_loss=0.1067, pruned_loss=0.01354, audio_tagging_loss=0.006008, over 15936.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09112, pruned_loss=0.01232, audio_tagging_loss=0.008701, over 3037361.52 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:07,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3373673.3333333335, ans=0.0 2023-11-26 11:46:31,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3373873.3333333335, ans=0.0 2023-11-26 11:46:32,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3373873.3333333335, ans=0.0 2023-11-26 11:46:36,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-26 11:46:39,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-26 11:46:39,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-26 11:46:40,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.798e+01 9.171e+01 9.764e+01 1.249e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 11:46:52,168 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-26 11:46:55,362 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1100, loss[loss=0.0731, simple_loss=0.1079, pruned_loss=0.01217, audio_tagging_loss=0.006961, over 13960.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09033, pruned_loss=0.01225, audio_tagging_loss=0.008765, over 3034513.22 frames. ], batch size: 50, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:57,479 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:47:06,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3374073.3333333335, ans=0.1 2023-11-26 11:47:47,596 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-26 11:47:50,745 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1150, loss[loss=0.05402, simple_loss=0.07609, pruned_loss=0.01052, audio_tagging_loss=0.005453, over 15533.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08983, pruned_loss=0.01227, audio_tagging_loss=0.008739, over 3038989.43 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:48:33,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.776e+01 9.671e+01 1.060e+02 1.290e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 11:48:43,894 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-26 11:48:47,300 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1200, loss[loss=0.08577, simple_loss=0.1245, pruned_loss=0.01603, audio_tagging_loss=0.007512, over 15740.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09103, pruned_loss=0.0125, audio_tagging_loss=0.008568, over 3043038.25 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:48:57,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3374740.0, ans=0.2 2023-11-26 11:48:59,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.25 vs. limit=15.0 2023-11-26 11:49:00,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3374740.0, ans=0.125 2023-11-26 11:49:07,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2023-11-26 11:49:08,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3374740.0, ans=0.125 2023-11-26 11:49:30,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3374873.3333333335, ans=0.015 2023-11-26 11:49:40,144 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-26 11:49:43,815 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1250, loss[loss=0.07688, simple_loss=0.1121, pruned_loss=0.01435, audio_tagging_loss=0.006497, over 15448.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09044, pruned_loss=0.0124, audio_tagging_loss=0.008537, over 3051335.79 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:49:56,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3375073.3333333335, ans=0.0 2023-11-26 11:49:59,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3375073.3333333335, ans=0.0 2023-11-26 11:50:18,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3375206.6666666665, ans=0.1 2023-11-26 11:50:26,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.620e+01 9.220e+01 9.934e+01 1.296e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 11:50:36,720 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-26 11:50:37,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2023-11-26 11:50:39,860 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1300, loss[loss=0.04348, simple_loss=0.06256, pruned_loss=0.004851, audio_tagging_loss=0.007349, over 14513.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08988, pruned_loss=0.01233, audio_tagging_loss=0.008567, over 3042491.08 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:50:46,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3375340.0, ans=0.125 2023-11-26 11:51:12,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3375540.0, ans=0.0 2023-11-26 11:51:13,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-26 11:51:20,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3375540.0, ans=0.125 2023-11-26 11:51:25,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=12.0 2023-11-26 11:51:30,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3375606.6666666665, ans=0.125 2023-11-26 11:51:32,263 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-26 11:51:36,021 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1350, loss[loss=0.05599, simple_loss=0.07131, pruned_loss=0.008958, audio_tagging_loss=0.01138, over 14735.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09093, pruned_loss=0.01239, audio_tagging_loss=0.008535, over 3042317.43 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:51:39,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-26 11:51:46,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3375740.0, ans=0.0 2023-11-26 11:52:11,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3375873.3333333335, ans=0.0 2023-11-26 11:52:15,832 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:52:19,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.796e+01 9.351e+01 9.969e+01 1.266e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 11:52:20,917 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:52:28,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-26 11:52:32,100 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1400, loss[loss=0.06371, simple_loss=0.08511, pruned_loss=0.01047, audio_tagging_loss=0.01068, over 15424.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09189, pruned_loss=0.01263, audio_tagging_loss=0.008598, over 3046751.35 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:52:39,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3376006.6666666665, ans=0.125 2023-11-26 11:52:53,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3376140.0, ans=0.0 2023-11-26 11:53:08,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3376206.6666666665, ans=0.1 2023-11-26 11:53:14,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=22.5 2023-11-26 11:53:22,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2023-11-26 11:53:24,996 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-26 11:53:26,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-11-26 11:53:28,648 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1450, loss[loss=0.07915, simple_loss=0.1161, pruned_loss=0.01401, audio_tagging_loss=0.00708, over 15730.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09219, pruned_loss=0.01273, audio_tagging_loss=0.008611, over 3046472.71 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:53:42,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3376406.6666666665, ans=0.025 2023-11-26 11:53:47,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3376406.6666666665, ans=0.125 2023-11-26 11:53:52,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-11-26 11:54:12,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.794e+01 9.262e+01 1.013e+02 1.743e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 11:54:12,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2023-11-26 11:54:13,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3376606.6666666665, ans=0.0 2023-11-26 11:54:17,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3376606.6666666665, ans=0.0 2023-11-26 11:54:20,647 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-26 11:54:23,737 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1500, loss[loss=0.06587, simple_loss=0.09113, pruned_loss=0.01126, audio_tagging_loss=0.009045, over 14969.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09182, pruned_loss=0.01271, audio_tagging_loss=0.008668, over 3041310.54 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:54:54,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3376806.6666666665, ans=0.0 2023-11-26 11:55:00,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3376873.3333333335, ans=0.125 2023-11-26 11:55:01,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3376873.3333333335, ans=0.0 2023-11-26 11:55:15,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3376940.0, ans=0.02 2023-11-26 11:55:16,595 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-26 11:55:20,270 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1550, loss[loss=0.05926, simple_loss=0.08041, pruned_loss=0.008139, audio_tagging_loss=0.01091, over 16015.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09092, pruned_loss=0.01253, audio_tagging_loss=0.00884, over 3039866.29 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 4.0 2023-11-26 11:55:41,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3377140.0, ans=0.125 2023-11-26 11:55:48,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3377140.0, ans=0.125 2023-11-26 11:55:49,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3377140.0, ans=0.125 2023-11-26 11:56:06,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.914e+01 9.617e+01 1.050e+02 1.576e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 11:56:12,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=22.5 2023-11-26 11:56:13,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-26 11:56:16,429 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1600, loss[loss=0.0587, simple_loss=0.08176, pruned_loss=0.009869, audio_tagging_loss=0.007953, over 14896.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09081, pruned_loss=0.01245, audio_tagging_loss=0.008931, over 3040533.53 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:56:26,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3377406.6666666665, ans=0.125 2023-11-26 11:56:26,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3377406.6666666665, ans=0.0 2023-11-26 11:56:27,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3377406.6666666665, ans=0.07 2023-11-26 11:56:48,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-26 11:56:49,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-26 11:57:00,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2023-11-26 11:57:02,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-26 11:57:09,303 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-26 11:57:12,407 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1650, loss[loss=0.0532, simple_loss=0.07051, pruned_loss=0.01009, audio_tagging_loss=0.007855, over 14871.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.0906, pruned_loss=0.01246, audio_tagging_loss=0.008935, over 3036046.93 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:57:46,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3377873.3333333335, ans=0.1 2023-11-26 11:57:47,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3377873.3333333335, ans=0.125 2023-11-26 11:57:56,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3377940.0, ans=0.2 2023-11-26 11:57:58,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.844e+01 9.530e+01 1.032e+02 1.539e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 11:58:05,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-26 11:58:09,009 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1700, loss[loss=0.06904, simple_loss=0.1011, pruned_loss=0.01223, audio_tagging_loss=0.00628, over 15751.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08989, pruned_loss=0.01236, audio_tagging_loss=0.008974, over 3049427.66 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:58:15,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3378006.6666666665, ans=0.125 2023-11-26 11:58:28,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3378073.3333333335, ans=0.125 2023-11-26 11:58:30,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3378140.0, ans=0.5 2023-11-26 11:58:34,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3378140.0, ans=0.125 2023-11-26 11:58:54,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3378273.3333333335, ans=0.0 2023-11-26 11:58:55,524 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:59:02,207 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-26 11:59:05,388 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1750, loss[loss=0.08576, simple_loss=0.1148, pruned_loss=0.02242, audio_tagging_loss=0.005957, over 15791.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08938, pruned_loss=0.01251, audio_tagging_loss=0.009035, over 3046661.70 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:59:08,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3378340.0, ans=0.125 2023-11-26 11:59:41,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-26 11:59:45,234 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:59:51,255 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.598e+01 9.201e+01 1.005e+02 1.270e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:59:57,758 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-26 12:00:01,109 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1800, loss[loss=0.06495, simple_loss=0.0937, pruned_loss=0.01284, audio_tagging_loss=0.005266, over 14396.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08902, pruned_loss=0.0123, audio_tagging_loss=0.008979, over 3038320.51 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:00:11,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3378740.0, ans=0.0 2023-11-26 12:00:19,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3378740.0, ans=0.125 2023-11-26 12:00:24,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3378806.6666666665, ans=0.0 2023-11-26 12:00:37,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3378873.3333333335, ans=0.125 2023-11-26 12:00:54,399 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-26 12:00:57,539 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1850, loss[loss=0.08396, simple_loss=0.1134, pruned_loss=0.01811, audio_tagging_loss=0.009151, over 15024.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08868, pruned_loss=0.01224, audio_tagging_loss=0.008964, over 3040567.82 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:01:32,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.30 vs. limit=10.0 2023-11-26 12:01:43,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.799e+01 9.499e+01 1.025e+02 1.305e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:01:50,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3379273.3333333335, ans=0.0 2023-11-26 12:01:50,855 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-26 12:01:53,968 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1900, loss[loss=0.05489, simple_loss=0.06867, pruned_loss=0.01346, audio_tagging_loss=0.0071, over 16412.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08921, pruned_loss=0.01238, audio_tagging_loss=0.008836, over 3043503.12 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:01:55,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3379340.0, ans=0.125 2023-11-26 12:02:08,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3379406.6666666665, ans=0.0 2023-11-26 12:02:12,425 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:02:16,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3379473.3333333335, ans=0.0 2023-11-26 12:02:22,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3379473.3333333335, ans=0.125 2023-11-26 12:02:24,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3379473.3333333335, ans=0.125 2023-11-26 12:02:46,123 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-26 12:02:49,331 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1950, loss[loss=0.06709, simple_loss=0.09662, pruned_loss=0.01045, audio_tagging_loss=0.008334, over 15210.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08945, pruned_loss=0.01231, audio_tagging_loss=0.00868, over 3045157.95 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:02:50,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3379673.3333333335, ans=0.125 2023-11-26 12:02:55,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3379673.3333333335, ans=0.0 2023-11-26 12:02:57,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3379673.3333333335, ans=0.0 2023-11-26 12:03:02,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.83 vs. limit=10.0 2023-11-26 12:03:03,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3379740.0, ans=10.0 2023-11-26 12:03:15,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3379806.6666666665, ans=0.125 2023-11-26 12:03:32,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3379873.3333333335, ans=0.0 2023-11-26 12:03:35,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.710e+01 9.475e+01 1.012e+02 2.962e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-26 12:03:39,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.23 vs. limit=10.0 2023-11-26 12:03:42,557 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-26 12:03:45,967 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2000, loss[loss=0.06154, simple_loss=0.08868, pruned_loss=0.00919, audio_tagging_loss=0.008006, over 14586.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08858, pruned_loss=0.01222, audio_tagging_loss=0.008754, over 3042446.41 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:04:11,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3380140.0, ans=0.05 2023-11-26 12:04:15,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2023-11-26 12:04:39,506 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-26 12:04:42,702 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2050, loss[loss=0.071, simple_loss=0.1042, pruned_loss=0.01066, audio_tagging_loss=0.008257, over 15035.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.089, pruned_loss=0.01221, audio_tagging_loss=0.008784, over 3041582.61 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:04:45,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3380340.0, ans=0.125 2023-11-26 12:04:53,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3380406.6666666665, ans=0.0 2023-11-26 12:05:13,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3380473.3333333335, ans=0.1 2023-11-26 12:05:27,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3380606.6666666665, ans=0.0 2023-11-26 12:05:28,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.859e+01 9.273e+01 1.013e+02 1.302e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:05:34,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-26 12:05:38,120 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2100, loss[loss=0.05369, simple_loss=0.07612, pruned_loss=0.006668, audio_tagging_loss=0.008966, over 15229.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08959, pruned_loss=0.01224, audio_tagging_loss=0.008688, over 3038237.61 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:05:56,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3380740.0, ans=0.125 2023-11-26 12:06:19,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-26 12:06:20,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-26 12:06:22,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2023-11-26 12:06:27,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3380940.0, ans=0.0 2023-11-26 12:06:28,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3380940.0, ans=0.125 2023-11-26 12:06:30,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-26 12:06:33,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3381006.6666666665, ans=0.125 2023-11-26 12:06:33,907 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2150, loss[loss=0.07826, simple_loss=0.1146, pruned_loss=0.01506, audio_tagging_loss=0.005908, over 15960.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08948, pruned_loss=0.01224, audio_tagging_loss=0.00868, over 3040066.47 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:07:05,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3381140.0, ans=0.125 2023-11-26 12:07:07,580 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:07:13,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3381206.6666666665, ans=0.125 2023-11-26 12:07:19,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.634e+01 9.242e+01 1.004e+02 1.355e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 12:07:26,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-26 12:07:26,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3381273.3333333335, ans=0.0 2023-11-26 12:07:30,658 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2200, loss[loss=0.06636, simple_loss=0.08607, pruned_loss=0.01325, audio_tagging_loss=0.01007, over 14460.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08905, pruned_loss=0.01194, audio_tagging_loss=0.008742, over 3042097.83 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:07:41,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3381406.6666666665, ans=0.2 2023-11-26 12:07:46,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3381406.6666666665, ans=0.2 2023-11-26 12:07:54,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3381473.3333333335, ans=0.0 2023-11-26 12:07:58,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3381473.3333333335, ans=0.125 2023-11-26 12:08:00,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3381473.3333333335, ans=0.07 2023-11-26 12:08:08,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3381540.0, ans=0.125 2023-11-26 12:08:23,015 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-26 12:08:26,133 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2250, loss[loss=0.06738, simple_loss=0.09286, pruned_loss=0.01204, audio_tagging_loss=0.008909, over 14346.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0893, pruned_loss=0.01205, audio_tagging_loss=0.008797, over 3043972.94 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:08:40,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3381740.0, ans=0.04949747468305833 2023-11-26 12:09:11,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.710e+01 9.301e+01 1.010e+02 1.448e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 12:09:13,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3381940.0, ans=0.0 2023-11-26 12:09:17,937 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-26 12:09:20,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3382006.6666666665, ans=0.125 2023-11-26 12:09:21,654 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2300, loss[loss=0.0669, simple_loss=0.09784, pruned_loss=0.01052, audio_tagging_loss=0.007457, over 15372.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08887, pruned_loss=0.01216, audio_tagging_loss=0.008799, over 3038698.72 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:09:30,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2023-11-26 12:09:42,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3382073.3333333335, ans=0.125 2023-11-26 12:09:49,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3382140.0, ans=0.0 2023-11-26 12:09:56,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3382206.6666666665, ans=0.125 2023-11-26 12:10:10,772 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:10:11,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-11-26 12:10:14,600 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-26 12:10:17,695 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2350, loss[loss=0.05428, simple_loss=0.07357, pruned_loss=0.009108, audio_tagging_loss=0.008387, over 14942.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08896, pruned_loss=0.01233, audio_tagging_loss=0.008855, over 3043908.26 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:10:31,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3382406.6666666665, ans=0.1 2023-11-26 12:10:47,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3382473.3333333335, ans=0.05 2023-11-26 12:11:01,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3382606.6666666665, ans=0.2 2023-11-26 12:11:03,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3382606.6666666665, ans=0.1 2023-11-26 12:11:04,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.890e+01 9.561e+01 1.021e+02 1.290e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 12:11:11,209 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-26 12:11:11,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-26 12:11:14,682 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2400, loss[loss=0.07298, simple_loss=0.1073, pruned_loss=0.0108, audio_tagging_loss=0.00852, over 15533.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08941, pruned_loss=0.01248, audio_tagging_loss=0.008948, over 3040478.81 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 12:11:33,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3382740.0, ans=0.1 2023-11-26 12:11:34,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-11-26 12:12:04,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3382940.0, ans=0.1 2023-11-26 12:12:05,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3382940.0, ans=0.0 2023-11-26 12:12:06,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-26 12:12:09,773 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2450, loss[loss=0.08582, simple_loss=0.1167, pruned_loss=0.01983, audio_tagging_loss=0.007644, over 15026.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08945, pruned_loss=0.01234, audio_tagging_loss=0.009007, over 3044509.67 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:12:20,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383073.3333333335, ans=0.1 2023-11-26 12:12:23,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2023-11-26 12:12:24,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3383073.3333333335, ans=0.125 2023-11-26 12:12:46,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3383206.6666666665, ans=0.1 2023-11-26 12:12:50,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383206.6666666665, ans=0.1 2023-11-26 12:12:57,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.728e+01 9.307e+01 9.934e+01 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:13:02,385 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-26 12:13:06,022 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2500, loss[loss=0.07713, simple_loss=0.1131, pruned_loss=0.01354, audio_tagging_loss=0.007041, over 14863.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08851, pruned_loss=0.01214, audio_tagging_loss=0.009103, over 3042090.84 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:13:11,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3383340.0, ans=0.0 2023-11-26 12:13:14,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3383340.0, ans=0.125 2023-11-26 12:13:15,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3383340.0, ans=0.125 2023-11-26 12:13:34,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3383473.3333333335, ans=0.2 2023-11-26 12:13:55,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3383606.6666666665, ans=6.0 2023-11-26 12:13:55,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3383606.6666666665, ans=0.0 2023-11-26 12:13:59,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-26 12:14:02,166 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2550, loss[loss=0.06843, simple_loss=0.1002, pruned_loss=0.01211, audio_tagging_loss=0.00622, over 15411.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08908, pruned_loss=0.01229, audio_tagging_loss=0.009023, over 3044506.11 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:14:15,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2023-11-26 12:14:20,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3383740.0, ans=0.1 2023-11-26 12:14:32,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3383806.6666666665, ans=0.125 2023-11-26 12:14:43,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3383873.3333333335, ans=0.125 2023-11-26 12:14:46,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3383940.0, ans=0.1 2023-11-26 12:14:49,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.661e+01 9.276e+01 1.004e+02 1.739e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:14:53,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=8.0 2023-11-26 12:14:54,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-26 12:14:56,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.60 vs. limit=10.0 2023-11-26 12:14:58,337 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2600, loss[loss=0.04443, simple_loss=0.05454, pruned_loss=0.00663, audio_tagging_loss=0.01053, over 15594.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08843, pruned_loss=0.01213, audio_tagging_loss=0.008939, over 3042865.39 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:15:09,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3384073.3333333335, ans=0.1 2023-11-26 12:15:24,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384140.0, ans=0.1 2023-11-26 12:15:39,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3384206.6666666665, ans=0.1 2023-11-26 12:15:40,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3384206.6666666665, ans=0.0 2023-11-26 12:15:51,092 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-26 12:15:54,229 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2650, loss[loss=0.06562, simple_loss=0.08618, pruned_loss=0.01239, audio_tagging_loss=0.01014, over 14648.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08802, pruned_loss=0.01199, audio_tagging_loss=0.008897, over 3042152.95 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:16:02,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2023-11-26 12:16:07,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3384406.6666666665, ans=0.0 2023-11-26 12:16:11,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3384406.6666666665, ans=0.0 2023-11-26 12:16:16,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384473.3333333335, ans=0.1 2023-11-26 12:16:22,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3384473.3333333335, ans=0.125 2023-11-26 12:16:24,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-26 12:16:42,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.705e+01 9.342e+01 1.013e+02 1.276e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:16:47,321 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-26 12:16:49,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3384673.3333333335, ans=0.125 2023-11-26 12:16:50,472 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2700, loss[loss=0.05615, simple_loss=0.06764, pruned_loss=0.01189, audio_tagging_loss=0.01044, over 14286.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08735, pruned_loss=0.0119, audio_tagging_loss=0.008905, over 3038009.94 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:17:07,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3384740.0, ans=0.2 2023-11-26 12:17:10,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3384740.0, ans=0.0 2023-11-26 12:17:28,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3384873.3333333335, ans=0.125 2023-11-26 12:17:32,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3384873.3333333335, ans=0.125 2023-11-26 12:17:34,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-11-26 12:17:38,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3384940.0, ans=0.125 2023-11-26 12:17:41,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3384940.0, ans=0.125 2023-11-26 12:17:42,643 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-26 12:17:45,898 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2750, loss[loss=0.05071, simple_loss=0.06785, pruned_loss=0.006377, audio_tagging_loss=0.01041, over 14863.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08746, pruned_loss=0.01173, audio_tagging_loss=0.008807, over 3046606.05 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:17:53,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3385006.6666666665, ans=0.0 2023-11-26 12:17:55,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3385006.6666666665, ans=0.07 2023-11-26 12:18:23,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3385206.6666666665, ans=0.0 2023-11-26 12:18:24,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3385206.6666666665, ans=0.05 2023-11-26 12:18:26,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3385206.6666666665, ans=0.95 2023-11-26 12:18:27,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3385206.6666666665, ans=0.125 2023-11-26 12:18:34,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.797e+01 9.310e+01 1.006e+02 1.204e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 12:18:36,016 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:18:36,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.0 2023-11-26 12:18:37,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3385273.3333333335, ans=0.0 2023-11-26 12:18:39,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-26 12:18:42,660 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2800, loss[loss=0.07644, simple_loss=0.1123, pruned_loss=0.0136, audio_tagging_loss=0.00671, over 15298.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08905, pruned_loss=0.01199, audio_tagging_loss=0.008731, over 3050130.59 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:18:47,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-26 12:18:48,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3385340.0, ans=0.1 2023-11-26 12:18:55,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3385406.6666666665, ans=0.125 2023-11-26 12:19:06,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3385473.3333333335, ans=0.0 2023-11-26 12:19:08,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3385473.3333333335, ans=0.2 2023-11-26 12:19:36,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-26 12:19:39,344 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2850, loss[loss=0.05186, simple_loss=0.06512, pruned_loss=0.006417, audio_tagging_loss=0.01288, over 15431.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08891, pruned_loss=0.01202, audio_tagging_loss=0.008787, over 3052995.76 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:19:45,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 12:19:54,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3385740.0, ans=0.125 2023-11-26 12:19:57,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-11-26 12:20:07,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-26 12:20:08,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-26 12:20:13,192 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:20:21,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3385873.3333333335, ans=0.2 2023-11-26 12:20:28,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.713e+01 9.303e+01 9.917e+01 1.324e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:20:30,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3385940.0, ans=0.04949747468305833 2023-11-26 12:20:31,667 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-26 12:20:34,807 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2900, loss[loss=0.07515, simple_loss=0.1052, pruned_loss=0.01615, audio_tagging_loss=0.006407, over 14778.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09007, pruned_loss=0.01228, audio_tagging_loss=0.008684, over 3051113.62 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:20:47,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2023-11-26 12:20:49,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3386073.3333333335, ans=0.2 2023-11-26 12:20:54,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3386073.3333333335, ans=0.07 2023-11-26 12:21:11,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3386206.6666666665, ans=0.125 2023-11-26 12:21:23,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3386273.3333333335, ans=0.125 2023-11-26 12:21:27,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-26 12:21:31,522 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2950, loss[loss=0.06838, simple_loss=0.09937, pruned_loss=0.01277, audio_tagging_loss=0.005928, over 14299.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09145, pruned_loss=0.01245, audio_tagging_loss=0.008603, over 3050896.01 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:21:42,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-11-26 12:22:06,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3386540.0, ans=0.125 2023-11-26 12:22:14,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3386540.0, ans=0.125 2023-11-26 12:22:15,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3386606.6666666665, ans=0.125 2023-11-26 12:22:20,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.672e+01 9.532e+01 9.988e+01 1.402e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 12:22:24,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-26 12:22:30,158 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3000, loss[loss=0.06025, simple_loss=0.0847, pruned_loss=0.00809, audio_tagging_loss=0.009814, over 15378.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09043, pruned_loss=0.01223, audio_tagging_loss=0.008727, over 3046955.93 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:22:30,159 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 12:22:43,865 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3575, 4.0047, 2.5722, 3.7107], device='cuda:1') 2023-11-26 12:23:02,712 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05754, simple_loss=0.05056, pruned_loss=0.00524, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 12:23:02,713 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 12:23:02,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3386673.3333333335, ans=0.125 2023-11-26 12:23:03,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3386673.3333333335, ans=0.1 2023-11-26 12:23:29,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3386806.6666666665, ans=0.2 2023-11-26 12:23:34,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3386806.6666666665, ans=0.125 2023-11-26 12:23:35,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3386873.3333333335, ans=0.125 2023-11-26 12:23:35,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3386873.3333333335, ans=0.125 2023-11-26 12:23:38,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-26 12:23:38,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3386873.3333333335, ans=0.125 2023-11-26 12:23:55,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-26 12:23:58,987 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3050, loss[loss=0.0614, simple_loss=0.08148, pruned_loss=0.01136, audio_tagging_loss=0.009298, over 14379.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09086, pruned_loss=0.01238, audio_tagging_loss=0.008758, over 3040869.58 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:24:00,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3387006.6666666665, ans=0.0 2023-11-26 12:24:02,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387006.6666666665, ans=0.1 2023-11-26 12:24:23,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3387140.0, ans=0.125 2023-11-26 12:24:32,791 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:24:37,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2023-11-26 12:24:48,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.074e+01 8.532e+01 9.331e+01 1.001e+02 1.251e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 12:24:52,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-26 12:24:55,664 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3100, loss[loss=0.0681, simple_loss=0.09393, pruned_loss=0.01371, audio_tagging_loss=0.007422, over 15145.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09055, pruned_loss=0.01247, audio_tagging_loss=0.008775, over 3042461.03 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:24:56,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3387340.0, ans=0.125 2023-11-26 12:25:08,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3387406.6666666665, ans=0.015 2023-11-26 12:25:10,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-26 12:25:19,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3387473.3333333335, ans=0.04949747468305833 2023-11-26 12:25:32,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3387540.0, ans=0.5 2023-11-26 12:25:35,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3387540.0, ans=0.0 2023-11-26 12:25:47,452 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-26 12:25:48,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3387606.6666666665, ans=0.1 2023-11-26 12:25:50,632 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3150, loss[loss=0.0629, simple_loss=0.08757, pruned_loss=0.01149, audio_tagging_loss=0.007624, over 16386.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09033, pruned_loss=0.01234, audio_tagging_loss=0.008805, over 3043094.09 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:26:27,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=12.0 2023-11-26 12:26:32,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3387873.3333333335, ans=0.0 2023-11-26 12:26:39,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.956e+01 9.437e+01 1.017e+02 1.314e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 12:26:42,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3387940.0, ans=0.125 2023-11-26 12:26:43,687 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-26 12:26:47,024 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3200, loss[loss=0.08405, simple_loss=0.1105, pruned_loss=0.01962, audio_tagging_loss=0.009187, over 15200.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09016, pruned_loss=0.01234, audio_tagging_loss=0.008843, over 3046077.37 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:26:51,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3388006.6666666665, ans=0.0 2023-11-26 12:26:58,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2023-11-26 12:27:20,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-26 12:27:35,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3388273.3333333335, ans=0.0 2023-11-26 12:27:40,471 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-26 12:27:43,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3388340.0, ans=0.125 2023-11-26 12:27:44,163 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3250, loss[loss=0.05412, simple_loss=0.07231, pruned_loss=0.007212, audio_tagging_loss=0.01075, over 15176.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08968, pruned_loss=0.01223, audio_tagging_loss=0.008914, over 3053691.08 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:27:58,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3388406.6666666665, ans=0.2 2023-11-26 12:28:09,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3388473.3333333335, ans=0.125 2023-11-26 12:28:12,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3388473.3333333335, ans=0.0 2023-11-26 12:28:25,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3388540.0, ans=0.0 2023-11-26 12:28:32,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3388606.6666666665, ans=0.2 2023-11-26 12:28:33,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.911e+01 9.477e+01 1.008e+02 1.285e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 12:28:33,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3388606.6666666665, ans=0.125 2023-11-26 12:28:36,993 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-26 12:28:39,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3388673.3333333335, ans=0.0 2023-11-26 12:28:40,130 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3300, loss[loss=0.08497, simple_loss=0.1208, pruned_loss=0.01541, audio_tagging_loss=0.009174, over 16562.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09013, pruned_loss=0.01229, audio_tagging_loss=0.009036, over 3047659.54 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:28:43,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-26 12:28:45,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=22.5 2023-11-26 12:28:59,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3388740.0, ans=0.125 2023-11-26 12:29:04,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3388806.6666666665, ans=0.1 2023-11-26 12:29:30,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3388940.0, ans=0.125 2023-11-26 12:29:32,637 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-26 12:29:33,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3388940.0, ans=0.125 2023-11-26 12:29:35,769 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3350, loss[loss=0.07389, simple_loss=0.09518, pruned_loss=0.01707, audio_tagging_loss=0.009233, over 15531.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09004, pruned_loss=0.0124, audio_tagging_loss=0.008997, over 3049845.50 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:29:40,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3389006.6666666665, ans=0.125 2023-11-26 12:29:51,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3389073.3333333335, ans=0.0 2023-11-26 12:29:59,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.19 vs. limit=10.0 2023-11-26 12:30:00,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3389140.0, ans=0.125 2023-11-26 12:30:20,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3389273.3333333335, ans=0.125 2023-11-26 12:30:25,724 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.755e+01 9.551e+01 1.033e+02 1.237e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 12:30:29,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-26 12:30:33,000 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3400, loss[loss=0.04184, simple_loss=0.05203, pruned_loss=0.006459, audio_tagging_loss=0.009364, over 13714.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09054, pruned_loss=0.01244, audio_tagging_loss=0.008753, over 3048121.49 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:30:41,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2023-11-26 12:30:44,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3389406.6666666665, ans=6.0 2023-11-26 12:30:50,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3389406.6666666665, ans=0.0 2023-11-26 12:30:56,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3389473.3333333335, ans=0.125 2023-11-26 12:31:24,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3389606.6666666665, ans=0.125 2023-11-26 12:31:25,716 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-26 12:31:28,815 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3450, loss[loss=0.09418, simple_loss=0.1327, pruned_loss=0.02062, audio_tagging_loss=0.007228, over 14638.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09153, pruned_loss=0.01248, audio_tagging_loss=0.008587, over 3050433.88 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:31:56,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-11-26 12:31:59,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.78 vs. limit=8.0 2023-11-26 12:32:18,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.804e+01 9.534e+01 1.051e+02 1.288e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 12:32:21,351 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-26 12:32:22,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3389940.0, ans=0.0 2023-11-26 12:32:25,070 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3500, loss[loss=0.06129, simple_loss=0.09656, pruned_loss=0.00632, audio_tagging_loss=0.006688, over 15312.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09034, pruned_loss=0.01224, audio_tagging_loss=0.008544, over 3048868.62 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:32:31,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.54 vs. limit=10.0 2023-11-26 12:32:35,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3390073.3333333335, ans=0.0 2023-11-26 12:32:37,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3390073.3333333335, ans=0.125 2023-11-26 12:32:39,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3390073.3333333335, ans=0.0 2023-11-26 12:32:39,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390073.3333333335, ans=0.1 2023-11-26 12:32:49,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3390140.0, ans=0.125 2023-11-26 12:32:55,566 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:33:03,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-11-26 12:33:04,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3390206.6666666665, ans=0.125 2023-11-26 12:33:10,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-11-26 12:33:16,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3390273.3333333335, ans=0.0 2023-11-26 12:33:17,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-26 12:33:21,104 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3550, loss[loss=0.07978, simple_loss=0.1079, pruned_loss=0.01862, audio_tagging_loss=0.007228, over 14287.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08902, pruned_loss=0.01204, audio_tagging_loss=0.008578, over 3043719.04 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:33:22,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3390340.0, ans=0.125 2023-11-26 12:33:50,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3390473.3333333335, ans=0.125 2023-11-26 12:33:52,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3390473.3333333335, ans=0.125 2023-11-26 12:34:04,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-26 12:34:09,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2023-11-26 12:34:10,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.734e+01 9.266e+01 9.991e+01 1.201e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 12:34:14,199 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-26 12:34:18,193 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3600, loss[loss=0.07751, simple_loss=0.1036, pruned_loss=0.01669, audio_tagging_loss=0.00903, over 14321.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08833, pruned_loss=0.01203, audio_tagging_loss=0.008544, over 3042592.95 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:34:29,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3390740.0, ans=0.125 2023-11-26 12:34:32,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.02 vs. limit=10.0 2023-11-26 12:34:51,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3390873.3333333335, ans=0.2 2023-11-26 12:34:54,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3390873.3333333335, ans=0.05 2023-11-26 12:35:05,270 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:35:10,377 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-26 12:35:13,508 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3650, loss[loss=0.05754, simple_loss=0.08106, pruned_loss=0.008862, audio_tagging_loss=0.008149, over 15357.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08784, pruned_loss=0.012, audio_tagging_loss=0.008558, over 3037634.18 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:35:28,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391073.3333333335, ans=0.1 2023-11-26 12:35:35,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3391140.0, ans=0.125 2023-11-26 12:35:35,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3391140.0, ans=0.0 2023-11-26 12:35:45,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3391140.0, ans=0.2 2023-11-26 12:35:55,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391206.6666666665, ans=0.1 2023-11-26 12:35:59,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-26 12:36:03,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.759e+01 9.340e+01 1.006e+02 1.350e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:36:06,533 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-26 12:36:09,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3391340.0, ans=0.125 2023-11-26 12:36:10,112 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3700, loss[loss=0.06339, simple_loss=0.09288, pruned_loss=0.008015, audio_tagging_loss=0.00894, over 15589.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08844, pruned_loss=0.012, audio_tagging_loss=0.008515, over 3038122.68 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:36:14,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391340.0, ans=0.1 2023-11-26 12:36:17,321 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:36:18,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3391340.0, ans=0.125 2023-11-26 12:36:30,509 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:37:02,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-26 12:37:04,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3391606.6666666665, ans=0.0 2023-11-26 12:37:06,113 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3750, loss[loss=0.06486, simple_loss=0.08453, pruned_loss=0.01263, audio_tagging_loss=0.009964, over 14813.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08834, pruned_loss=0.01194, audio_tagging_loss=0.008565, over 3043536.71 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:37:33,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-26 12:37:45,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3391873.3333333335, ans=0.125 2023-11-26 12:37:46,460 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:37:51,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-26 12:37:52,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-11-26 12:37:54,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.936e+01 9.506e+01 1.051e+02 1.254e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:37:58,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-26 12:38:01,932 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3800, loss[loss=0.06035, simple_loss=0.07451, pruned_loss=0.01197, audio_tagging_loss=0.01113, over 14521.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.0888, pruned_loss=0.01202, audio_tagging_loss=0.00859, over 3040313.85 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:38:04,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3392006.6666666665, ans=0.125 2023-11-26 12:38:08,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3392006.6666666665, ans=0.2 2023-11-26 12:38:49,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3392273.3333333335, ans=0.0 2023-11-26 12:38:54,727 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-26 12:38:57,872 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3850, loss[loss=0.06307, simple_loss=0.08645, pruned_loss=0.01142, audio_tagging_loss=0.00842, over 16000.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08879, pruned_loss=0.01198, audio_tagging_loss=0.008704, over 3041759.42 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:39:15,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3392406.6666666665, ans=0.125 2023-11-26 12:39:32,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3392540.0, ans=0.0 2023-11-26 12:39:43,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.51 vs. limit=15.0 2023-11-26 12:39:46,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3392606.6666666665, ans=0.0 2023-11-26 12:39:49,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.737e+01 9.345e+01 1.016e+02 1.351e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 12:39:51,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-26 12:39:53,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-26 12:39:54,434 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3900, loss[loss=0.06406, simple_loss=0.07736, pruned_loss=0.01301, audio_tagging_loss=0.01237, over 16022.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08868, pruned_loss=0.01202, audio_tagging_loss=0.00878, over 3047518.08 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:40:00,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3392673.3333333335, ans=0.2 2023-11-26 12:40:09,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3392740.0, ans=10.0 2023-11-26 12:40:21,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-11-26 12:40:40,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3392940.0, ans=0.125 2023-11-26 12:40:46,835 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-26 12:40:49,985 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3950, loss[loss=0.04627, simple_loss=0.06284, pruned_loss=0.004688, audio_tagging_loss=0.01017, over 14232.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08855, pruned_loss=0.01206, audio_tagging_loss=0.00884, over 3042390.02 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:40:52,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3393006.6666666665, ans=0.035 2023-11-26 12:41:12,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3393140.0, ans=0.04949747468305833 2023-11-26 12:41:16,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3393140.0, ans=0.125 2023-11-26 12:41:23,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3393206.6666666665, ans=0.0 2023-11-26 12:41:25,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3393206.6666666665, ans=0.0 2023-11-26 12:41:31,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2023-11-26 12:41:40,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.928e+01 9.625e+01 1.027e+02 1.240e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 12:41:43,143 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-26 12:41:46,549 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4000, loss[loss=0.05956, simple_loss=0.07843, pruned_loss=0.01225, audio_tagging_loss=0.00809, over 15358.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08921, pruned_loss=0.01227, audio_tagging_loss=0.008932, over 3034030.44 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:41:48,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3393340.0, ans=0.125 2023-11-26 12:41:50,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3393340.0, ans=0.125 2023-11-26 12:41:54,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3393340.0, ans=0.2 2023-11-26 12:41:57,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3393406.6666666665, ans=0.125 2023-11-26 12:42:09,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3393473.3333333335, ans=0.2 2023-11-26 12:42:12,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3393473.3333333335, ans=0.0 2023-11-26 12:42:12,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2023-11-26 12:42:25,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3393540.0, ans=0.2 2023-11-26 12:42:39,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-26 12:42:43,140 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4050, loss[loss=0.08828, simple_loss=0.1167, pruned_loss=0.02231, audio_tagging_loss=0.00761, over 15531.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08974, pruned_loss=0.01242, audio_tagging_loss=0.008965, over 3033550.73 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:42:47,335 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:43:02,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3393740.0, ans=0.1 2023-11-26 12:43:17,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3393873.3333333335, ans=0.125 2023-11-26 12:43:19,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2023-11-26 12:43:25,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3393873.3333333335, ans=0.0 2023-11-26 12:43:33,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.904e+01 9.389e+01 9.930e+01 1.705e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 12:43:35,713 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-26 12:43:38,792 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4100, loss[loss=0.07248, simple_loss=0.105, pruned_loss=0.01333, audio_tagging_loss=0.006649, over 14016.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0906, pruned_loss=0.01237, audio_tagging_loss=0.008905, over 3035752.77 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:43:53,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3394073.3333333335, ans=0.0 2023-11-26 12:44:02,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3394140.0, ans=0.1 2023-11-26 12:44:15,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=22.5 2023-11-26 12:44:30,926 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-26 12:44:34,590 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4150, loss[loss=0.05547, simple_loss=0.07534, pruned_loss=0.01132, audio_tagging_loss=0.006484, over 14581.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09094, pruned_loss=0.01247, audio_tagging_loss=0.008767, over 3039808.68 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:44:42,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3394340.0, ans=0.2 2023-11-26 12:44:50,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3394406.6666666665, ans=0.0 2023-11-26 12:45:01,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3394473.3333333335, ans=0.125 2023-11-26 12:45:03,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3394473.3333333335, ans=0.0 2023-11-26 12:45:17,396 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:45:18,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3394606.6666666665, ans=0.2 2023-11-26 12:45:25,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.865e+01 9.444e+01 1.016e+02 1.308e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 12:45:27,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-26 12:45:31,516 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4200, loss[loss=0.05826, simple_loss=0.07923, pruned_loss=0.01043, audio_tagging_loss=0.008211, over 14408.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09039, pruned_loss=0.01257, audio_tagging_loss=0.008578, over 3041002.91 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:45:59,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3394806.6666666665, ans=0.0 2023-11-26 12:46:01,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3394806.6666666665, ans=0.125 2023-11-26 12:46:07,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3394873.3333333335, ans=0.2 2023-11-26 12:46:07,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3394873.3333333335, ans=0.125 2023-11-26 12:46:09,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3394873.3333333335, ans=0.0 2023-11-26 12:46:23,999 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-26 12:46:27,115 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4250, loss[loss=0.06571, simple_loss=0.09631, pruned_loss=0.01207, audio_tagging_loss=0.005489, over 16642.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.0915, pruned_loss=0.01268, audio_tagging_loss=0.008436, over 3048091.10 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:46:35,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3395006.6666666665, ans=0.0 2023-11-26 12:46:36,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-26 12:46:42,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3395073.3333333335, ans=0.0 2023-11-26 12:46:48,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3395140.0, ans=0.0 2023-11-26 12:47:18,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.812e+01 8.820e+01 9.502e+01 1.020e+02 1.301e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:47:19,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-26 12:47:22,846 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4300, loss[loss=0.08024, simple_loss=0.1077, pruned_loss=0.02009, audio_tagging_loss=0.0063, over 14722.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09105, pruned_loss=0.01262, audio_tagging_loss=0.008489, over 3046128.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:47:25,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3395340.0, ans=0.125 2023-11-26 12:47:26,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3395340.0, ans=0.5 2023-11-26 12:47:29,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3395340.0, ans=0.025 2023-11-26 12:47:36,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3395406.6666666665, ans=0.07 2023-11-26 12:47:37,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-26 12:47:39,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=22.5 2023-11-26 12:47:44,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-26 12:47:55,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-26 12:48:02,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3395540.0, ans=0.0 2023-11-26 12:48:08,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3395606.6666666665, ans=0.2 2023-11-26 12:48:16,098 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-26 12:48:19,156 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4350, loss[loss=0.09161, simple_loss=0.1381, pruned_loss=0.01621, audio_tagging_loss=0.006353, over 15856.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09099, pruned_loss=0.01259, audio_tagging_loss=0.008517, over 3047749.57 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:48:23,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3395673.3333333335, ans=0.125 2023-11-26 12:48:27,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3395673.3333333335, ans=0.125 2023-11-26 12:48:28,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-26 12:48:36,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3395740.0, ans=0.125 2023-11-26 12:48:54,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3395873.3333333335, ans=0.125 2023-11-26 12:48:55,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3395873.3333333335, ans=0.0 2023-11-26 12:49:05,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-11-26 12:49:10,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.017e+01 9.685e+01 1.042e+02 1.339e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-26 12:49:11,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-26 12:49:15,350 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4400, loss[loss=0.05728, simple_loss=0.07546, pruned_loss=0.01008, audio_tagging_loss=0.009473, over 16419.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.0901, pruned_loss=0.01248, audio_tagging_loss=0.00849, over 3047507.83 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:49:47,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3396140.0, ans=0.0 2023-11-26 12:49:54,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3396206.6666666665, ans=0.125 2023-11-26 12:49:56,023 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:49:58,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3396206.6666666665, ans=0.125 2023-11-26 12:49:58,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2023-11-26 12:50:05,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.82 vs. limit=10.0 2023-11-26 12:50:05,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3396273.3333333335, ans=0.125 2023-11-26 12:50:07,542 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-26 12:50:10,720 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4450, loss[loss=0.06846, simple_loss=0.09099, pruned_loss=0.01205, audio_tagging_loss=0.01091, over 15508.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09034, pruned_loss=0.01256, audio_tagging_loss=0.008554, over 3049736.36 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:50:30,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3396406.6666666665, ans=0.04949747468305833 2023-11-26 12:50:41,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3396473.3333333335, ans=0.07 2023-11-26 12:50:55,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-11-26 12:51:03,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.973e+01 9.425e+01 1.012e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 12:51:03,518 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-26 12:51:07,259 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4500, loss[loss=0.0609, simple_loss=0.08367, pruned_loss=0.0106, audio_tagging_loss=0.008465, over 15300.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09056, pruned_loss=0.01258, audio_tagging_loss=0.008466, over 3046177.62 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:51:10,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3396673.3333333335, ans=0.2 2023-11-26 12:51:13,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3396673.3333333335, ans=0.1 2023-11-26 12:51:43,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3396873.3333333335, ans=0.0 2023-11-26 12:51:55,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=12.0 2023-11-26 12:51:59,849 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-26 12:52:01,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3396940.0, ans=0.125 2023-11-26 12:52:03,034 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4550, loss[loss=0.0831, simple_loss=0.1158, pruned_loss=0.01859, audio_tagging_loss=0.006614, over 14965.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08974, pruned_loss=0.01245, audio_tagging_loss=0.008534, over 3041943.03 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:52:04,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3397006.6666666665, ans=0.05 2023-11-26 12:52:42,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3397206.6666666665, ans=0.1 2023-11-26 12:52:46,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3397273.3333333335, ans=0.95 2023-11-26 12:52:47,786 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:52:55,275 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-26 12:52:56,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.693e+01 9.289e+01 1.006e+02 1.287e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 12:52:58,669 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4600, loss[loss=0.05681, simple_loss=0.07373, pruned_loss=0.01061, audio_tagging_loss=0.009332, over 13967.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08954, pruned_loss=0.01257, audio_tagging_loss=0.008641, over 3038051.90 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:53:14,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-11-26 12:53:39,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3397540.0, ans=0.1 2023-11-26 12:53:51,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-26 12:53:54,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3397673.3333333335, ans=0.025 2023-11-26 12:53:54,951 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4650, loss[loss=0.05228, simple_loss=0.07189, pruned_loss=0.009679, audio_tagging_loss=0.00666, over 15859.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08891, pruned_loss=0.01228, audio_tagging_loss=0.008761, over 3044546.31 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:53:55,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3397673.3333333335, ans=0.0 2023-11-26 12:53:55,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-11-26 12:54:13,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3397740.0, ans=0.0 2023-11-26 12:54:27,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3397873.3333333335, ans=0.0 2023-11-26 12:54:28,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-26 12:54:36,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3397873.3333333335, ans=0.2 2023-11-26 12:54:48,062 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-26 12:54:49,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.687e+01 9.484e+01 1.034e+02 1.399e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 12:54:51,723 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4700, loss[loss=0.07098, simple_loss=0.09008, pruned_loss=0.01539, audio_tagging_loss=0.01055, over 16003.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08965, pruned_loss=0.01238, audio_tagging_loss=0.008816, over 3046513.51 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:55:07,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2023-11-26 12:55:12,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-26 12:55:44,100 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-26 12:55:47,202 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4750, loss[loss=0.08362, simple_loss=0.1175, pruned_loss=0.01711, audio_tagging_loss=0.007744, over 14944.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09018, pruned_loss=0.01243, audio_tagging_loss=0.008926, over 3048624.21 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:55:50,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3398340.0, ans=0.0 2023-11-26 12:56:39,946 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-26 12:56:40,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.623e+01 9.231e+01 9.879e+01 9.064e+02, threshold=1.846e+02, percent-clipped=2.0 2023-11-26 12:56:43,264 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4800, loss[loss=0.08142, simple_loss=0.1151, pruned_loss=0.0174, audio_tagging_loss=0.00645, over 15326.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09007, pruned_loss=0.01234, audio_tagging_loss=0.008921, over 3048714.66 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:56:44,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3398673.3333333335, ans=0.0 2023-11-26 12:56:47,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-26 12:56:54,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-26 12:57:05,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3398806.6666666665, ans=0.09899494936611666 2023-11-26 12:57:26,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.02 vs. limit=6.0 2023-11-26 12:57:28,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398940.0, ans=0.1 2023-11-26 12:57:33,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3398940.0, ans=0.125 2023-11-26 12:57:36,386 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-26 12:57:39,522 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4850, loss[loss=0.07579, simple_loss=0.1182, pruned_loss=0.01113, audio_tagging_loss=0.005558, over 15713.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09104, pruned_loss=0.01234, audio_tagging_loss=0.008961, over 3052138.62 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:57:44,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3399006.6666666665, ans=0.2 2023-11-26 12:57:47,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3399006.6666666665, ans=0.0 2023-11-26 12:57:51,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3399073.3333333335, ans=0.125 2023-11-26 12:57:58,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3399073.3333333335, ans=0.07 2023-11-26 12:58:12,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.96 vs. limit=10.0 2023-11-26 12:58:23,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.49 vs. limit=10.0 2023-11-26 12:58:31,572 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-26 12:58:33,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.785e+01 9.504e+01 1.038e+02 1.484e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:58:35,208 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4900, loss[loss=0.0602, simple_loss=0.07329, pruned_loss=0.01506, audio_tagging_loss=0.008489, over 15593.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09067, pruned_loss=0.0123, audio_tagging_loss=0.008945, over 3051508.11 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:58:54,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3399406.6666666665, ans=0.125 2023-11-26 12:59:01,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3399473.3333333335, ans=0.0 2023-11-26 12:59:16,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3399540.0, ans=0.05 2023-11-26 12:59:19,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3399606.6666666665, ans=0.95 2023-11-26 12:59:23,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-26 12:59:25,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3399606.6666666665, ans=0.0 2023-11-26 12:59:27,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-26 12:59:30,725 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4950, loss[loss=0.05091, simple_loss=0.06374, pruned_loss=0.006657, audio_tagging_loss=0.01238, over 14486.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09076, pruned_loss=0.01238, audio_tagging_loss=0.008748, over 3050801.39 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:00:02,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3399806.6666666665, ans=0.1 2023-11-26 13:00:03,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-26 13:00:11,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-26 13:00:12,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2023-11-26 13:00:13,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-26 13:00:23,365 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-26 13:00:23,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3399940.0, ans=0.125 2023-11-26 13:00:24,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.558e+01 9.135e+01 1.006e+02 1.501e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 13:00:27,003 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5000, loss[loss=0.08979, simple_loss=0.1292, pruned_loss=0.01889, audio_tagging_loss=0.006292, over 15582.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09072, pruned_loss=0.01227, audio_tagging_loss=0.008548, over 3047897.14 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:00:41,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3400073.3333333335, ans=0.125 2023-11-26 13:00:45,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3400073.3333333335, ans=0.0 2023-11-26 13:00:51,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3400140.0, ans=0.05 2023-11-26 13:00:54,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-11-26 13:01:10,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3400273.3333333335, ans=0.2 2023-11-26 13:01:16,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3400273.3333333335, ans=0.125 2023-11-26 13:01:18,676 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-26 13:01:21,758 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5050, loss[loss=0.0667, simple_loss=0.08858, pruned_loss=0.01277, audio_tagging_loss=0.009635, over 15264.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0906, pruned_loss=0.01223, audio_tagging_loss=0.008518, over 3047961.54 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:01:27,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-11-26 13:01:45,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3400473.3333333335, ans=0.1 2023-11-26 13:01:56,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-26 13:02:09,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3400606.6666666665, ans=0.125 2023-11-26 13:02:14,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-26 13:02:14,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.485e+01 9.179e+01 9.722e+01 1.214e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 13:02:17,650 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5100, loss[loss=0.03275, simple_loss=0.03433, pruned_loss=0.005817, audio_tagging_loss=0.009764, over 15469.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09013, pruned_loss=0.01215, audio_tagging_loss=0.008494, over 3052859.38 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:02:27,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3400740.0, ans=0.1 2023-11-26 13:02:43,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3400806.6666666665, ans=0.2 2023-11-26 13:02:53,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3400873.3333333335, ans=0.1 2023-11-26 13:03:10,340 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-26 13:03:13,917 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5150, loss[loss=0.05739, simple_loss=0.07465, pruned_loss=0.0105, audio_tagging_loss=0.009555, over 14463.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08868, pruned_loss=0.01209, audio_tagging_loss=0.00846, over 3051329.50 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:03:19,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3401006.6666666665, ans=0.125 2023-11-26 13:03:19,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3401006.6666666665, ans=0.0 2023-11-26 13:03:27,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3401073.3333333335, ans=0.125 2023-11-26 13:03:37,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3401140.0, ans=0.125 2023-11-26 13:03:42,870 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:03:48,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-26 13:03:52,959 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:04:06,052 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-26 13:04:06,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.866e+01 9.575e+01 1.024e+02 1.389e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 13:04:07,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3401273.3333333335, ans=0.2 2023-11-26 13:04:09,402 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5200, loss[loss=0.05085, simple_loss=0.06632, pruned_loss=0.008347, audio_tagging_loss=0.009344, over 15294.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08969, pruned_loss=0.01238, audio_tagging_loss=0.008465, over 3041447.08 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:04:17,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3401340.0, ans=0.0 2023-11-26 13:04:29,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3401406.6666666665, ans=0.0 2023-11-26 13:04:40,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-26 13:05:01,178 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-26 13:05:04,289 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5250, loss[loss=0.06669, simple_loss=0.09259, pruned_loss=0.01125, audio_tagging_loss=0.009144, over 16120.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09061, pruned_loss=0.01275, audio_tagging_loss=0.008411, over 3041845.13 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:05:38,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3401873.3333333335, ans=0.125 2023-11-26 13:05:39,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3401873.3333333335, ans=0.125 2023-11-26 13:05:39,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3401873.3333333335, ans=0.0 2023-11-26 13:05:45,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3401873.3333333335, ans=0.125 2023-11-26 13:05:58,245 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-26 13:06:01,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.828e+01 9.543e+01 1.025e+02 1.295e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 13:06:01,340 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5300, loss[loss=0.07258, simple_loss=0.1016, pruned_loss=0.0127, audio_tagging_loss=0.009059, over 15069.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09052, pruned_loss=0.01255, audio_tagging_loss=0.008414, over 3039648.56 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:06:04,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-11-26 13:06:23,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3402140.0, ans=0.125 2023-11-26 13:06:23,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3402140.0, ans=0.125 2023-11-26 13:06:40,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3402206.6666666665, ans=0.09899494936611666 2023-11-26 13:06:42,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3402206.6666666665, ans=0.0 2023-11-26 13:06:52,149 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:06:54,113 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-26 13:06:57,227 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5350, loss[loss=0.06886, simple_loss=0.08822, pruned_loss=0.01389, audio_tagging_loss=0.01085, over 14884.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09121, pruned_loss=0.01267, audio_tagging_loss=0.008386, over 3044008.51 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:07:02,857 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:07:39,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-26 13:07:49,195 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-26 13:07:52,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.825e+01 9.466e+01 1.015e+02 1.457e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 13:07:52,642 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5400, loss[loss=0.06675, simple_loss=0.09659, pruned_loss=0.01221, audio_tagging_loss=0.006247, over 15354.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09111, pruned_loss=0.01268, audio_tagging_loss=0.008513, over 3043704.42 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:08:04,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3402740.0, ans=0.0 2023-11-26 13:08:09,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3402740.0, ans=0.1 2023-11-26 13:08:11,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3402740.0, ans=0.07 2023-11-26 13:08:21,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3402806.6666666665, ans=0.125 2023-11-26 13:08:23,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3402806.6666666665, ans=0.125 2023-11-26 13:08:29,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3402873.3333333335, ans=0.0 2023-11-26 13:08:45,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-26 13:08:49,166 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5450, loss[loss=0.07554, simple_loss=0.1022, pruned_loss=0.01679, audio_tagging_loss=0.007638, over 15610.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09078, pruned_loss=0.01255, audio_tagging_loss=0.008594, over 3034792.20 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:08:51,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3403006.6666666665, ans=0.125 2023-11-26 13:08:56,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2023-11-26 13:08:57,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3403006.6666666665, ans=0.2 2023-11-26 13:09:29,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3403206.6666666665, ans=0.125 2023-11-26 13:09:41,530 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-26 13:09:43,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3403340.0, ans=0.125 2023-11-26 13:09:44,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 8.724e+01 9.189e+01 1.004e+02 1.414e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 13:09:44,668 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5500, loss[loss=0.07894, simple_loss=0.1057, pruned_loss=0.01653, audio_tagging_loss=0.009569, over 15333.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09098, pruned_loss=0.0126, audio_tagging_loss=0.008591, over 3041103.49 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:09:50,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3403340.0, ans=0.0 2023-11-26 13:09:56,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3403406.6666666665, ans=0.125 2023-11-26 13:10:07,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3403473.3333333335, ans=0.125 2023-11-26 13:10:09,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3403473.3333333335, ans=0.0 2023-11-26 13:10:18,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3403540.0, ans=0.125 2023-11-26 13:10:23,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3403540.0, ans=0.2 2023-11-26 13:10:37,161 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-26 13:10:40,230 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5550, loss[loss=0.07287, simple_loss=0.09774, pruned_loss=0.01418, audio_tagging_loss=0.009821, over 17013.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09066, pruned_loss=0.01247, audio_tagging_loss=0.008657, over 3047340.24 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:10:40,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3403673.3333333335, ans=0.0 2023-11-26 13:10:56,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3403740.0, ans=0.1 2023-11-26 13:10:56,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-26 13:11:10,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3403806.6666666665, ans=0.0 2023-11-26 13:11:10,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3403806.6666666665, ans=0.125 2023-11-26 13:11:23,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3403873.3333333335, ans=0.125 2023-11-26 13:11:26,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3403940.0, ans=0.0 2023-11-26 13:11:33,011 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-26 13:11:33,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:33,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:35,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3404006.6666666665, ans=0.125 2023-11-26 13:11:36,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.941e+01 9.582e+01 1.033e+02 2.288e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 13:11:36,428 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5600, loss[loss=0.05875, simple_loss=0.07701, pruned_loss=0.01048, audio_tagging_loss=0.009764, over 16318.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09112, pruned_loss=0.01251, audio_tagging_loss=0.008724, over 3053626.46 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:11:48,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3404073.3333333335, ans=0.125 2023-11-26 13:12:15,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3404206.6666666665, ans=0.0 2023-11-26 13:12:18,015 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:12:22,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-26 13:12:30,144 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-26 13:12:33,244 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5650, loss[loss=0.05604, simple_loss=0.06843, pruned_loss=0.008511, audio_tagging_loss=0.01331, over 14478.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09001, pruned_loss=0.01228, audio_tagging_loss=0.008794, over 3049923.03 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:12:33,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404340.0, ans=0.1 2023-11-26 13:12:54,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-11-26 13:13:05,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3404540.0, ans=0.125 2023-11-26 13:13:08,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3404540.0, ans=0.125 2023-11-26 13:13:08,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3404540.0, ans=0.07 2023-11-26 13:13:17,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3404606.6666666665, ans=0.0 2023-11-26 13:13:18,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3404606.6666666665, ans=0.125 2023-11-26 13:13:25,378 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-26 13:13:25,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3404606.6666666665, ans=0.125 2023-11-26 13:13:28,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.672e+01 9.212e+01 9.928e+01 1.414e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 13:13:28,540 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5700, loss[loss=0.06488, simple_loss=0.0739, pruned_loss=0.01887, audio_tagging_loss=0.009061, over 14272.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09031, pruned_loss=0.01243, audio_tagging_loss=0.008813, over 3045543.56 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:13:30,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-26 13:13:33,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3404673.3333333335, ans=0.5 2023-11-26 13:13:35,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3404673.3333333335, ans=0.125 2023-11-26 13:13:38,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3404740.0, ans=0.2 2023-11-26 13:13:56,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3404806.6666666665, ans=0.2 2023-11-26 13:14:09,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3404873.3333333335, ans=0.07 2023-11-26 13:14:21,364 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-26 13:14:24,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-26 13:14:24,492 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5750, loss[loss=0.0635, simple_loss=0.08438, pruned_loss=0.01098, audio_tagging_loss=0.01033, over 15225.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08974, pruned_loss=0.01229, audio_tagging_loss=0.008724, over 3048105.89 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:14:52,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3405140.0, ans=0.125 2023-11-26 13:15:05,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-26 13:15:17,202 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-26 13:15:20,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.469e+01 9.302e+01 1.019e+02 1.569e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 13:15:20,849 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5800, loss[loss=0.04807, simple_loss=0.06275, pruned_loss=0.007116, audio_tagging_loss=0.009574, over 14806.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08992, pruned_loss=0.01223, audio_tagging_loss=0.008684, over 3043113.28 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:15:29,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2023-11-26 13:15:44,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-26 13:15:56,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3405540.0, ans=0.1 2023-11-26 13:15:59,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3405540.0, ans=0.125 2023-11-26 13:16:08,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3405606.6666666665, ans=0.125 2023-11-26 13:16:10,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3405606.6666666665, ans=0.125 2023-11-26 13:16:13,317 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-26 13:16:16,494 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5850, loss[loss=0.07456, simple_loss=0.1008, pruned_loss=0.01751, audio_tagging_loss=0.006655, over 14383.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0895, pruned_loss=0.0122, audio_tagging_loss=0.008688, over 3038142.47 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:16:24,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3405673.3333333335, ans=0.125 2023-11-26 13:16:48,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3405873.3333333335, ans=0.125 2023-11-26 13:17:08,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-26 13:17:09,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3405940.0, ans=0.2 2023-11-26 13:17:11,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.774e+01 9.383e+01 1.009e+02 2.236e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 13:17:11,794 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5900, loss[loss=0.06863, simple_loss=0.09026, pruned_loss=0.01595, audio_tagging_loss=0.007556, over 15746.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0904, pruned_loss=0.01216, audio_tagging_loss=0.008618, over 3039913.73 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:17:32,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3406073.3333333335, ans=0.125 2023-11-26 13:17:43,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3406140.0, ans=0.2 2023-11-26 13:17:56,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-26 13:18:04,168 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-26 13:18:06,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3406340.0, ans=0.0 2023-11-26 13:18:07,251 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5950, loss[loss=0.05963, simple_loss=0.07865, pruned_loss=0.01007, audio_tagging_loss=0.01023, over 15086.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09119, pruned_loss=0.01236, audio_tagging_loss=0.008656, over 3044646.77 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:18:13,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406340.0, ans=0.1 2023-11-26 13:18:21,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3406406.6666666665, ans=0.0 2023-11-26 13:18:29,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3406473.3333333335, ans=0.07 2023-11-26 13:18:59,798 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-26 13:19:03,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.762e+01 9.206e+01 9.891e+01 1.298e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:19:03,735 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6000, loss[loss=0.05551, simple_loss=0.07459, pruned_loss=0.008955, audio_tagging_loss=0.00926, over 15638.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08993, pruned_loss=0.01218, audio_tagging_loss=0.008697, over 3040870.93 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:19:03,736 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 13:19:26,301 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0216, 5.8788, 5.6997, 5.6040], device='cuda:1') 2023-11-26 13:19:36,325 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05784, simple_loss=0.05057, pruned_loss=0.005191, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 13:19:36,325 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 13:19:39,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3406673.3333333335, ans=0.125 2023-11-26 13:19:49,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406740.0, ans=0.1 2023-11-26 13:20:00,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3406806.6666666665, ans=0.125 2023-11-26 13:20:09,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3406873.3333333335, ans=0.125 2023-11-26 13:20:17,570 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:20:22,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3406940.0, ans=0.05 2023-11-26 13:20:28,769 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-26 13:20:28,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3406940.0, ans=0.125 2023-11-26 13:20:32,337 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6050, loss[loss=0.05963, simple_loss=0.07816, pruned_loss=0.01348, audio_tagging_loss=0.007068, over 15240.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09041, pruned_loss=0.01248, audio_tagging_loss=0.008633, over 3045624.57 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:20:32,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2023-11-26 13:20:33,626 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:20:33,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3407006.6666666665, ans=0.0 2023-11-26 13:21:06,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3407206.6666666665, ans=0.125 2023-11-26 13:21:19,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3407273.3333333335, ans=0.2 2023-11-26 13:21:23,760 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-26 13:21:24,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3407273.3333333335, ans=0.125 2023-11-26 13:21:26,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3407340.0, ans=0.125 2023-11-26 13:21:27,482 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6100, loss[loss=0.06616, simple_loss=0.08998, pruned_loss=0.0121, audio_tagging_loss=0.009071, over 14250.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08986, pruned_loss=0.01235, audio_tagging_loss=0.008708, over 3051199.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:21:28,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 8.831e+01 9.526e+01 1.012e+02 1.265e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 13:22:03,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3407540.0, ans=0.125 2023-11-26 13:22:08,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-11-26 13:22:19,339 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-26 13:22:23,025 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6150, loss[loss=0.05951, simple_loss=0.08221, pruned_loss=0.009968, audio_tagging_loss=0.008432, over 14796.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09026, pruned_loss=0.01236, audio_tagging_loss=0.008794, over 3051632.24 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:22:29,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3407673.3333333335, ans=0.0 2023-11-26 13:22:32,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3407673.3333333335, ans=0.0 2023-11-26 13:22:47,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-26 13:23:10,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3407940.0, ans=0.125 2023-11-26 13:23:10,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3407940.0, ans=15.0 2023-11-26 13:23:15,412 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-26 13:23:19,295 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6200, loss[loss=0.06412, simple_loss=0.08622, pruned_loss=0.01236, audio_tagging_loss=0.008647, over 16424.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08993, pruned_loss=0.01239, audio_tagging_loss=0.008822, over 3058196.24 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:23:20,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.535e+01 9.196e+01 1.004e+02 1.259e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 13:23:28,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=22.5 2023-11-26 13:23:38,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-26 13:23:58,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3408206.6666666665, ans=0.125 2023-11-26 13:24:00,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3408206.6666666665, ans=0.125 2023-11-26 13:24:07,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3408273.3333333335, ans=0.1 2023-11-26 13:24:10,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3408273.3333333335, ans=0.125 2023-11-26 13:24:11,505 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-26 13:24:14,669 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6250, loss[loss=0.0702, simple_loss=0.09061, pruned_loss=0.01436, audio_tagging_loss=0.01053, over 15830.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08986, pruned_loss=0.01253, audio_tagging_loss=0.008892, over 3053900.24 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:24:15,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3408340.0, ans=0.1 2023-11-26 13:24:20,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3408340.0, ans=0.0 2023-11-26 13:24:32,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=12.0 2023-11-26 13:24:55,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3408540.0, ans=0.05 2023-11-26 13:25:00,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-26 13:25:02,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3408606.6666666665, ans=0.2 2023-11-26 13:25:07,719 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-26 13:25:10,842 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6300, loss[loss=0.07788, simple_loss=0.1019, pruned_loss=0.01723, audio_tagging_loss=0.00971, over 15273.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08981, pruned_loss=0.01273, audio_tagging_loss=0.008951, over 3046944.76 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:25:12,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.818e+01 9.508e+01 1.037e+02 1.214e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 13:25:22,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3408740.0, ans=0.125 2023-11-26 13:25:25,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3408740.0, ans=0.0 2023-11-26 13:25:56,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3408940.0, ans=0.0 2023-11-26 13:26:02,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3408940.0, ans=0.0 2023-11-26 13:26:04,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-26 13:26:07,279 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6350, loss[loss=0.06888, simple_loss=0.101, pruned_loss=0.01077, audio_tagging_loss=0.007616, over 14676.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08991, pruned_loss=0.0126, audio_tagging_loss=0.008951, over 3047966.71 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:26:07,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3409006.6666666665, ans=0.125 2023-11-26 13:26:16,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3409006.6666666665, ans=0.0 2023-11-26 13:26:29,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-26 13:26:30,314 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:26:45,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3409206.6666666665, ans=0.0 2023-11-26 13:26:59,933 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-26 13:27:03,271 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6400, loss[loss=0.07016, simple_loss=0.09219, pruned_loss=0.01336, audio_tagging_loss=0.0107, over 15068.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09009, pruned_loss=0.01253, audio_tagging_loss=0.009038, over 3044286.45 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:27:03,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2023-11-26 13:27:04,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.943e+01 9.499e+01 1.031e+02 1.393e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 13:27:23,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2023-11-26 13:27:28,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3409473.3333333335, ans=0.2 2023-11-26 13:27:37,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2023-11-26 13:27:51,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3409606.6666666665, ans=0.0 2023-11-26 13:27:53,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3409606.6666666665, ans=0.1 2023-11-26 13:27:55,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-26 13:27:58,639 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6450, loss[loss=0.07392, simple_loss=0.1046, pruned_loss=0.01296, audio_tagging_loss=0.008672, over 16716.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09018, pruned_loss=0.01232, audio_tagging_loss=0.009075, over 3042300.54 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:21,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3409806.6666666665, ans=0.0 2023-11-26 13:28:51,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-26 13:28:54,735 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6500, loss[loss=0.07783, simple_loss=0.1115, pruned_loss=0.015, audio_tagging_loss=0.007067, over 16057.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09024, pruned_loss=0.01226, audio_tagging_loss=0.008985, over 3053200.19 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:55,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2023-11-26 13:28:55,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.812e+01 9.257e+01 9.962e+01 1.246e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 13:28:57,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3410006.6666666665, ans=0.2 2023-11-26 13:29:06,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2023-11-26 13:29:12,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3410073.3333333335, ans=0.125 2023-11-26 13:29:13,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-26 13:29:23,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410140.0, ans=0.125 2023-11-26 13:29:26,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3410206.6666666665, ans=0.2 2023-11-26 13:29:31,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3410206.6666666665, ans=0.0 2023-11-26 13:29:41,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.53 vs. limit=10.0 2023-11-26 13:29:45,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3410273.3333333335, ans=0.0 2023-11-26 13:29:47,559 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-26 13:29:50,642 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6550, loss[loss=0.06701, simple_loss=0.09212, pruned_loss=0.01406, audio_tagging_loss=0.006889, over 14241.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0905, pruned_loss=0.01228, audio_tagging_loss=0.008807, over 3055555.56 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:29:50,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3410340.0, ans=0.125 2023-11-26 13:29:54,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3410340.0, ans=0.0 2023-11-26 13:29:55,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3410340.0, ans=0.125 2023-11-26 13:30:18,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3410473.3333333335, ans=0.125 2023-11-26 13:30:36,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-26 13:30:36,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-26 13:30:42,992 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-26 13:30:46,416 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6600, loss[loss=0.06621, simple_loss=0.08843, pruned_loss=0.01476, audio_tagging_loss=0.007236, over 15117.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09059, pruned_loss=0.01221, audio_tagging_loss=0.008656, over 3058197.34 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:30:47,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.677e+01 9.398e+01 1.026e+02 1.405e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 13:30:54,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410673.3333333335, ans=0.1 2023-11-26 13:30:57,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3410740.0, ans=0.2 2023-11-26 13:31:05,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3410740.0, ans=0.95 2023-11-26 13:31:23,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-26 13:31:24,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3410873.3333333335, ans=0.125 2023-11-26 13:31:29,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3410873.3333333335, ans=0.125 2023-11-26 13:31:38,994 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-26 13:31:40,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3410940.0, ans=0.1 2023-11-26 13:31:40,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3410940.0, ans=0.125 2023-11-26 13:31:42,725 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6650, loss[loss=0.0666, simple_loss=0.09235, pruned_loss=0.01269, audio_tagging_loss=0.007729, over 14862.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09106, pruned_loss=0.01228, audio_tagging_loss=0.008625, over 3049145.71 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:32:26,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3411273.3333333335, ans=0.125 2023-11-26 13:32:35,535 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-26 13:32:36,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3411273.3333333335, ans=0.0 2023-11-26 13:32:38,678 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6700, loss[loss=0.05509, simple_loss=0.07128, pruned_loss=0.01134, audio_tagging_loss=0.008111, over 16968.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09035, pruned_loss=0.01212, audio_tagging_loss=0.008584, over 3041317.62 frames. ], batch size: 70, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:32:40,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.754e+01 9.381e+01 1.004e+02 1.497e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 13:32:40,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3411340.0, ans=0.0 2023-11-26 13:32:47,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3411340.0, ans=0.1 2023-11-26 13:32:58,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3411406.6666666665, ans=0.125 2023-11-26 13:33:01,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2023-11-26 13:33:14,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3411540.0, ans=0.2 2023-11-26 13:33:30,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3411606.6666666665, ans=0.125 2023-11-26 13:33:31,084 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-26 13:33:34,182 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6750, loss[loss=0.06515, simple_loss=0.08605, pruned_loss=0.01341, audio_tagging_loss=0.008706, over 14952.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08936, pruned_loss=0.01212, audio_tagging_loss=0.008606, over 3035444.20 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:33:34,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3411673.3333333335, ans=0.125 2023-11-26 13:33:37,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3411673.3333333335, ans=0.2 2023-11-26 13:34:16,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3411873.3333333335, ans=0.125 2023-11-26 13:34:26,690 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-26 13:34:30,028 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6800, loss[loss=0.05842, simple_loss=0.07187, pruned_loss=0.01226, audio_tagging_loss=0.01022, over 16029.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08864, pruned_loss=0.01206, audio_tagging_loss=0.008664, over 3034984.92 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:34:30,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3412006.6666666665, ans=0.0 2023-11-26 13:34:32,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.841e+01 9.365e+01 1.006e+02 1.409e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 13:34:37,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3412006.6666666665, ans=0.125 2023-11-26 13:34:39,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2023-11-26 13:34:55,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3412140.0, ans=0.0 2023-11-26 13:35:05,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3412206.6666666665, ans=0.125 2023-11-26 13:35:20,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2023-11-26 13:35:24,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-26 13:35:27,362 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6850, loss[loss=0.06166, simple_loss=0.08888, pruned_loss=0.01108, audio_tagging_loss=0.006143, over 14752.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08923, pruned_loss=0.01218, audio_tagging_loss=0.008642, over 3036082.96 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:35:37,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-26 13:35:43,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3412406.6666666665, ans=15.0 2023-11-26 13:35:59,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2023-11-26 13:36:19,387 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-26 13:36:22,524 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6900, loss[loss=0.06227, simple_loss=0.08626, pruned_loss=0.012, audio_tagging_loss=0.007136, over 15184.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09024, pruned_loss=0.01223, audio_tagging_loss=0.008616, over 3031027.02 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:36:24,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.611e+01 9.198e+01 9.954e+01 1.491e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:36:26,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3412673.3333333335, ans=0.125 2023-11-26 13:36:31,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3412673.3333333335, ans=0.1 2023-11-26 13:37:08,634 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:37:15,715 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-26 13:37:18,933 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6950, loss[loss=0.05919, simple_loss=0.08118, pruned_loss=0.0112, audio_tagging_loss=0.0074, over 15045.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09042, pruned_loss=0.01234, audio_tagging_loss=0.008614, over 3030109.01 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:37:24,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-26 13:37:37,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3413073.3333333335, ans=0.1 2023-11-26 13:37:39,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3413073.3333333335, ans=0.125 2023-11-26 13:38:07,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3413273.3333333335, ans=0.125 2023-11-26 13:38:11,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-26 13:38:17,724 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7000, loss[loss=0.07297, simple_loss=0.09574, pruned_loss=0.01715, audio_tagging_loss=0.00795, over 15243.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0896, pruned_loss=0.01228, audio_tagging_loss=0.008702, over 3031332.36 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:38:20,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.732e+01 9.495e+01 1.005e+02 2.082e+02, threshold=1.899e+02, percent-clipped=1.0 2023-11-26 13:38:30,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413406.6666666665, ans=0.1 2023-11-26 13:38:37,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3413406.6666666665, ans=0.125 2023-11-26 13:38:40,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3413473.3333333335, ans=0.0 2023-11-26 13:38:44,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-26 13:38:47,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-26 13:39:04,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3413606.6666666665, ans=0.2 2023-11-26 13:39:05,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3413606.6666666665, ans=0.0 2023-11-26 13:39:08,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3413606.6666666665, ans=0.0 2023-11-26 13:39:10,701 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-26 13:39:13,848 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7050, loss[loss=0.07392, simple_loss=0.09731, pruned_loss=0.01631, audio_tagging_loss=0.008958, over 13889.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08925, pruned_loss=0.01241, audio_tagging_loss=0.008697, over 3026385.49 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:39:26,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3413740.0, ans=0.05 2023-11-26 13:39:30,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-26 13:39:37,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3413806.6666666665, ans=0.125 2023-11-26 13:39:55,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.07 vs. limit=10.0 2023-11-26 13:39:57,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3413940.0, ans=0.0 2023-11-26 13:40:06,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-26 13:40:06,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-26 13:40:09,740 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7100, loss[loss=0.0658, simple_loss=0.09209, pruned_loss=0.009693, audio_tagging_loss=0.01006, over 15695.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08872, pruned_loss=0.01223, audio_tagging_loss=0.008754, over 3033714.18 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:40:10,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3414006.6666666665, ans=0.5 2023-11-26 13:40:12,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.711e+01 9.572e+01 1.021e+02 1.655e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-26 13:40:22,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=12.0 2023-11-26 13:40:34,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3414140.0, ans=0.2 2023-11-26 13:40:42,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-11-26 13:40:45,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3414206.6666666665, ans=0.0 2023-11-26 13:41:00,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2023-11-26 13:41:02,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-26 13:41:05,579 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7150, loss[loss=0.05857, simple_loss=0.07723, pruned_loss=0.008612, audio_tagging_loss=0.01134, over 13898.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08946, pruned_loss=0.01225, audio_tagging_loss=0.008801, over 3033081.47 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:41:12,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3414340.0, ans=0.125 2023-11-26 13:41:17,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3414406.6666666665, ans=0.125 2023-11-26 13:41:58,323 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-26 13:41:59,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3414606.6666666665, ans=0.05 2023-11-26 13:42:02,392 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7200, loss[loss=0.0634, simple_loss=0.09981, pruned_loss=0.007345, audio_tagging_loss=0.006149, over 15760.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08958, pruned_loss=0.01221, audio_tagging_loss=0.008863, over 3038040.76 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:42:05,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.947e+01 9.542e+01 1.037e+02 1.437e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 13:42:06,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3414673.3333333335, ans=10.0 2023-11-26 13:42:09,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3414673.3333333335, ans=0.125 2023-11-26 13:42:19,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3414740.0, ans=0.95 2023-11-26 13:42:25,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3414806.6666666665, ans=0.0 2023-11-26 13:42:31,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3414806.6666666665, ans=0.1 2023-11-26 13:42:54,844 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-26 13:42:57,947 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7250, loss[loss=0.06387, simple_loss=0.07864, pruned_loss=0.01281, audio_tagging_loss=0.01174, over 14374.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08915, pruned_loss=0.01214, audio_tagging_loss=0.00901, over 3038046.10 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:43:08,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3415073.3333333335, ans=0.1 2023-11-26 13:43:28,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3415140.0, ans=0.2 2023-11-26 13:43:33,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3415206.6666666665, ans=0.07 2023-11-26 13:43:37,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3415206.6666666665, ans=0.2 2023-11-26 13:43:39,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3415206.6666666665, ans=0.0 2023-11-26 13:43:48,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3415273.3333333335, ans=0.02 2023-11-26 13:43:51,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-26 13:43:52,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3415273.3333333335, ans=0.125 2023-11-26 13:43:54,220 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7300, loss[loss=0.09305, simple_loss=0.1331, pruned_loss=0.02044, audio_tagging_loss=0.006074, over 16099.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08869, pruned_loss=0.0121, audio_tagging_loss=0.008991, over 3038587.38 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:43:59,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.745e+01 9.385e+01 1.003e+02 1.402e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 13:44:02,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3415340.0, ans=0.125 2023-11-26 13:44:05,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3415406.6666666665, ans=0.125 2023-11-26 13:44:24,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3415473.3333333335, ans=0.125 2023-11-26 13:44:25,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3415473.3333333335, ans=0.0 2023-11-26 13:44:33,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3415540.0, ans=0.2 2023-11-26 13:44:35,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-26 13:44:42,122 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:44:47,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-26 13:44:47,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3415606.6666666665, ans=0.1 2023-11-26 13:44:48,603 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:44:48,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.11 vs. limit=10.0 2023-11-26 13:44:50,501 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7350, loss[loss=0.07932, simple_loss=0.1101, pruned_loss=0.01792, audio_tagging_loss=0.006367, over 15480.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08962, pruned_loss=0.01224, audio_tagging_loss=0.008771, over 3042410.73 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:45:03,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-26 13:45:14,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2023-11-26 13:45:30,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3415873.3333333335, ans=0.0 2023-11-26 13:45:34,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3415940.0, ans=0.0 2023-11-26 13:45:40,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-26 13:45:43,381 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-26 13:45:46,743 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7400, loss[loss=0.07602, simple_loss=0.1067, pruned_loss=0.01504, audio_tagging_loss=0.00762, over 15645.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08937, pruned_loss=0.01224, audio_tagging_loss=0.008709, over 3042066.30 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:45:50,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.921e+01 9.521e+01 1.008e+02 1.264e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 13:45:51,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3416006.6666666665, ans=0.125 2023-11-26 13:46:06,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3416073.3333333335, ans=0.125 2023-11-26 13:46:18,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3416140.0, ans=0.125 2023-11-26 13:46:40,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-26 13:46:43,240 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7450, loss[loss=0.05977, simple_loss=0.08312, pruned_loss=0.01079, audio_tagging_loss=0.007417, over 15154.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08913, pruned_loss=0.0123, audio_tagging_loss=0.008661, over 3043959.53 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:47:01,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3416406.6666666665, ans=0.125 2023-11-26 13:47:35,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-26 13:47:39,112 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7500, loss[loss=0.06855, simple_loss=0.09392, pruned_loss=0.013, audio_tagging_loss=0.008586, over 14984.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08948, pruned_loss=0.01233, audio_tagging_loss=0.008603, over 3044303.87 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:47:40,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=22.5 2023-11-26 13:47:43,323 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.713e+01 9.201e+01 9.904e+01 1.159e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:47:53,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3416740.0, ans=0.125 2023-11-26 13:48:31,302 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-26 13:48:34,366 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7550, loss[loss=0.05842, simple_loss=0.07281, pruned_loss=0.01048, audio_tagging_loss=0.01153, over 14907.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08957, pruned_loss=0.0123, audio_tagging_loss=0.008619, over 3047308.89 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:48:39,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3417006.6666666665, ans=0.1 2023-11-26 13:48:43,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=22.5 2023-11-26 13:48:49,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3417073.3333333335, ans=0.0 2023-11-26 13:49:01,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-26 13:49:05,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3417140.0, ans=0.2 2023-11-26 13:49:08,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3417206.6666666665, ans=0.125 2023-11-26 13:49:17,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3417206.6666666665, ans=0.125 2023-11-26 13:49:27,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-26 13:49:31,666 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7600, loss[loss=0.04788, simple_loss=0.06162, pruned_loss=0.007076, audio_tagging_loss=0.009994, over 14554.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08885, pruned_loss=0.01221, audio_tagging_loss=0.008626, over 3045265.67 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:49:32,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3417340.0, ans=0.0 2023-11-26 13:49:34,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2023-11-26 13:49:35,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.691e+01 9.310e+01 9.815e+01 1.310e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 13:50:05,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2023-11-26 13:50:24,318 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-26 13:50:27,354 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7650, loss[loss=0.07865, simple_loss=0.1114, pruned_loss=0.01463, audio_tagging_loss=0.00831, over 15058.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08924, pruned_loss=0.01238, audio_tagging_loss=0.00862, over 3048002.16 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:50:35,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3417673.3333333335, ans=0.2 2023-11-26 13:51:07,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3417873.3333333335, ans=0.0 2023-11-26 13:51:16,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2023-11-26 13:51:18,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3417940.0, ans=10.0 2023-11-26 13:51:19,505 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-26 13:51:20,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3417940.0, ans=0.125 2023-11-26 13:51:22,607 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7700, loss[loss=0.06896, simple_loss=0.09661, pruned_loss=0.01172, audio_tagging_loss=0.008935, over 15281.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08993, pruned_loss=0.01235, audio_tagging_loss=0.0086, over 3044013.00 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:51:26,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.781e+01 9.620e+01 1.045e+02 1.417e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 13:51:40,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3418073.3333333335, ans=0.125 2023-11-26 13:51:40,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3418073.3333333335, ans=0.125 2023-11-26 13:51:42,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3418073.3333333335, ans=0.0 2023-11-26 13:51:46,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=12.0 2023-11-26 13:51:54,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3418140.0, ans=0.125 2023-11-26 13:52:15,475 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-26 13:52:19,057 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7750, loss[loss=0.04844, simple_loss=0.06973, pruned_loss=0.006039, audio_tagging_loss=0.007532, over 15368.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08955, pruned_loss=0.01218, audio_tagging_loss=0.008604, over 3052038.73 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:53:07,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418606.6666666665, ans=0.1 2023-11-26 13:53:12,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-26 13:53:12,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3418606.6666666665, ans=0.2 2023-11-26 13:53:13,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3418606.6666666665, ans=0.0 2023-11-26 13:53:15,459 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7800, loss[loss=0.06738, simple_loss=0.09715, pruned_loss=0.0105, audio_tagging_loss=0.008303, over 15682.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08948, pruned_loss=0.01223, audio_tagging_loss=0.008695, over 3052631.02 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:53:19,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 9.103e+01 9.758e+01 1.031e+02 1.342e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 13:53:22,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3418673.3333333335, ans=0.125 2023-11-26 13:53:31,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=12.0 2023-11-26 13:53:44,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.85 vs. limit=6.0 2023-11-26 13:53:51,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-26 13:54:07,467 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-26 13:54:10,519 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7850, loss[loss=0.05415, simple_loss=0.07314, pruned_loss=0.009417, audio_tagging_loss=0.008159, over 15033.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08947, pruned_loss=0.01216, audio_tagging_loss=0.008732, over 3050159.14 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:54:22,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3419073.3333333335, ans=0.5 2023-11-26 13:54:25,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3419073.3333333335, ans=0.0 2023-11-26 13:54:41,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3419140.0, ans=0.125 2023-11-26 13:54:41,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3419140.0, ans=0.0 2023-11-26 13:54:50,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-26 13:55:02,783 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-26 13:55:06,488 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7900, loss[loss=0.08869, simple_loss=0.1197, pruned_loss=0.01982, audio_tagging_loss=0.009032, over 15059.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08984, pruned_loss=0.01223, audio_tagging_loss=0.008746, over 3053944.39 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:55:12,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.008e+01 9.612e+01 1.015e+02 1.376e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 13:55:14,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3419340.0, ans=0.125 2023-11-26 13:55:16,847 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:55:19,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3419406.6666666665, ans=0.2 2023-11-26 13:55:26,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3419406.6666666665, ans=0.125 2023-11-26 13:55:36,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3419473.3333333335, ans=0.125 2023-11-26 13:55:57,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3419606.6666666665, ans=0.0 2023-11-26 13:55:59,422 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-26 13:56:03,123 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7950, loss[loss=0.06242, simple_loss=0.08312, pruned_loss=0.01478, audio_tagging_loss=0.006083, over 14735.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08977, pruned_loss=0.01238, audio_tagging_loss=0.008824, over 3054494.24 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:56:17,938 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:56:40,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3419873.3333333335, ans=0.125 2023-11-26 13:56:54,988 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-26 13:56:58,395 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8000, loss[loss=0.05949, simple_loss=0.08229, pruned_loss=0.01002, audio_tagging_loss=0.008322, over 14911.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09003, pruned_loss=0.01248, audio_tagging_loss=0.008894, over 3054523.20 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:57:03,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.642e+01 9.203e+01 9.908e+01 1.245e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:57:14,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-26 13:57:19,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3420073.3333333335, ans=0.0 2023-11-26 13:57:50,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-26 13:57:54,565 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8050, loss[loss=0.04049, simple_loss=0.04424, pruned_loss=0.00603, audio_tagging_loss=0.01234, over 16546.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08993, pruned_loss=0.01241, audio_tagging_loss=0.008867, over 3052975.82 frames. ], batch size: 68, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:09,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3420406.6666666665, ans=0.1 2023-11-26 13:58:35,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3420540.0, ans=0.125 2023-11-26 13:58:46,643 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-26 13:58:48,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3420673.3333333335, ans=0.125 2023-11-26 13:58:50,316 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8100, loss[loss=0.06618, simple_loss=0.09068, pruned_loss=0.0136, audio_tagging_loss=0.007237, over 15074.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08979, pruned_loss=0.01228, audio_tagging_loss=0.008804, over 3050691.89 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:56,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.587e+01 9.236e+01 9.771e+01 1.279e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 13:58:57,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3420673.3333333335, ans=0.1 2023-11-26 13:59:05,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3420740.0, ans=0.125 2023-11-26 13:59:20,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3420806.6666666665, ans=0.2 2023-11-26 13:59:21,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2023-11-26 13:59:35,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=15.0 2023-11-26 13:59:43,033 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-26 13:59:46,088 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8150, loss[loss=0.05912, simple_loss=0.07564, pruned_loss=0.01106, audio_tagging_loss=0.01024, over 15128.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09002, pruned_loss=0.01241, audio_tagging_loss=0.00878, over 3052876.92 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:00:09,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2023-11-26 14:00:21,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=12.0 2023-11-26 14:00:37,940 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-26 14:00:41,337 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8200, loss[loss=0.06951, simple_loss=0.0967, pruned_loss=0.01103, audio_tagging_loss=0.01014, over 14911.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09019, pruned_loss=0.01245, audio_tagging_loss=0.008712, over 3053164.69 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:00:43,495 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:00:47,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.881e+01 9.462e+01 1.023e+02 1.490e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 14:00:57,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3421406.6666666665, ans=0.2 2023-11-26 14:01:01,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3421406.6666666665, ans=0.0 2023-11-26 14:01:08,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3421473.3333333335, ans=0.0 2023-11-26 14:01:10,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-26 14:01:12,429 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:01:25,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3421606.6666666665, ans=0.0 2023-11-26 14:01:34,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-26 14:01:37,857 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8250, loss[loss=0.07477, simple_loss=0.1008, pruned_loss=0.0165, audio_tagging_loss=0.007846, over 14947.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09075, pruned_loss=0.01258, audio_tagging_loss=0.008652, over 3051995.08 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:01:40,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3421673.3333333335, ans=0.0 2023-11-26 14:01:43,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3421673.3333333335, ans=0.0 2023-11-26 14:01:53,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2023-11-26 14:02:11,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3421873.3333333335, ans=0.125 2023-11-26 14:02:11,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 14:02:17,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3421873.3333333335, ans=0.125 2023-11-26 14:02:19,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3421873.3333333335, ans=0.07 2023-11-26 14:02:20,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2023-11-26 14:02:30,602 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-26 14:02:34,279 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8300, loss[loss=0.07525, simple_loss=0.1108, pruned_loss=0.01278, audio_tagging_loss=0.007052, over 14925.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09104, pruned_loss=0.01249, audio_tagging_loss=0.008646, over 3057288.39 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:02:40,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.839e+01 9.487e+01 1.004e+02 1.588e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 14:02:44,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2023-11-26 14:02:48,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3422073.3333333335, ans=0.125 2023-11-26 14:03:11,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3422206.6666666665, ans=0.2 2023-11-26 14:03:15,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3422206.6666666665, ans=0.125 2023-11-26 14:03:16,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2023-11-26 14:03:16,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422206.6666666665, ans=0.1 2023-11-26 14:03:17,995 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:03:19,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3422273.3333333335, ans=0.125 2023-11-26 14:03:26,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-26 14:03:29,419 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8350, loss[loss=0.07962, simple_loss=0.1173, pruned_loss=0.01422, audio_tagging_loss=0.006762, over 15469.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09114, pruned_loss=0.01247, audio_tagging_loss=0.008566, over 3052082.83 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 14:03:29,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3422340.0, ans=0.125 2023-11-26 14:03:31,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3422340.0, ans=0.0 2023-11-26 14:03:53,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3422473.3333333335, ans=10.0 2023-11-26 14:03:56,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3422473.3333333335, ans=0.125 2023-11-26 14:03:58,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.94 vs. limit=15.0 2023-11-26 14:04:22,049 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-26 14:04:25,970 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8400, loss[loss=0.04873, simple_loss=0.06382, pruned_loss=0.007931, audio_tagging_loss=0.00889, over 14566.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09075, pruned_loss=0.01239, audio_tagging_loss=0.008571, over 3053256.56 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:04:29,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=10.0 2023-11-26 14:04:33,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.557e+01 9.224e+01 9.865e+01 1.202e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 14:04:50,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3422806.6666666665, ans=0.0 2023-11-26 14:04:55,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-26 14:04:59,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3422873.3333333335, ans=0.125 2023-11-26 14:05:03,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3422873.3333333335, ans=0.2 2023-11-26 14:05:17,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=22.5 2023-11-26 14:05:18,110 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-26 14:05:21,187 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8450, loss[loss=0.05682, simple_loss=0.07071, pruned_loss=0.01025, audio_tagging_loss=0.01121, over 15332.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09006, pruned_loss=0.01225, audio_tagging_loss=0.008573, over 3051734.58 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:05:43,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3423140.0, ans=0.125 2023-11-26 14:06:13,895 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-26 14:06:17,005 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8500, loss[loss=0.07952, simple_loss=0.1131, pruned_loss=0.01655, audio_tagging_loss=0.006403, over 16642.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09027, pruned_loss=0.01247, audio_tagging_loss=0.008649, over 3053641.24 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:06:17,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3423340.0, ans=0.125 2023-11-26 14:06:19,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3423340.0, ans=0.125 2023-11-26 14:06:24,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.916e+01 9.646e+01 1.037e+02 1.510e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 14:07:09,549 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-26 14:07:10,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423606.6666666665, ans=0.1 2023-11-26 14:07:12,641 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8550, loss[loss=0.05509, simple_loss=0.07104, pruned_loss=0.01073, audio_tagging_loss=0.008839, over 13252.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.0893, pruned_loss=0.01218, audio_tagging_loss=0.008653, over 3046194.59 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:07:20,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3423673.3333333335, ans=0.125 2023-11-26 14:07:20,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3423673.3333333335, ans=0.09899494936611666 2023-11-26 14:08:05,914 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-26 14:08:09,289 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8600, loss[loss=0.07085, simple_loss=0.1036, pruned_loss=0.01295, audio_tagging_loss=0.00608, over 15712.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08835, pruned_loss=0.01205, audio_tagging_loss=0.008689, over 3044277.05 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:08:16,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.755e+01 9.267e+01 1.010e+02 1.487e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 14:08:45,796 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:09:01,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-26 14:09:05,089 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8650, loss[loss=0.06443, simple_loss=0.0933, pruned_loss=0.0104, audio_tagging_loss=0.007371, over 17161.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08829, pruned_loss=0.01198, audio_tagging_loss=0.008727, over 3050698.41 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:09:20,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3424406.6666666665, ans=0.0 2023-11-26 14:09:39,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3424540.0, ans=0.125 2023-11-26 14:09:42,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2023-11-26 14:09:43,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3424540.0, ans=0.125 2023-11-26 14:09:56,947 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-26 14:10:00,530 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8700, loss[loss=0.07771, simple_loss=0.1078, pruned_loss=0.01413, audio_tagging_loss=0.009678, over 15188.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08981, pruned_loss=0.01216, audio_tagging_loss=0.008752, over 3056924.23 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:10:08,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 8.732e+01 9.410e+01 1.013e+02 1.633e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 14:10:12,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3424740.0, ans=0.0 2023-11-26 14:10:14,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-26 14:10:19,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3424740.0, ans=0.5 2023-11-26 14:10:50,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3424940.0, ans=0.2 2023-11-26 14:10:53,871 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-26 14:10:57,022 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8750, loss[loss=0.08348, simple_loss=0.1253, pruned_loss=0.01319, audio_tagging_loss=0.007637, over 15699.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09, pruned_loss=0.01224, audio_tagging_loss=0.008774, over 3052810.00 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:11:05,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3425006.6666666665, ans=0.125 2023-11-26 14:11:13,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3425073.3333333335, ans=0.125 2023-11-26 14:11:19,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-26 14:11:21,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425140.0, ans=0.1 2023-11-26 14:11:49,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-26 14:11:52,397 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8800, loss[loss=0.06309, simple_loss=0.08577, pruned_loss=0.01167, audio_tagging_loss=0.008539, over 14491.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09004, pruned_loss=0.01224, audio_tagging_loss=0.008862, over 3057763.01 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:12:00,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.965e+01 9.351e+01 9.840e+01 1.391e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 14:12:00,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425340.0, ans=0.1 2023-11-26 14:12:08,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3425406.6666666665, ans=0.125 2023-11-26 14:12:24,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425473.3333333335, ans=0.1 2023-11-26 14:12:30,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3425540.0, ans=0.0 2023-11-26 14:12:31,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3425540.0, ans=0.0 2023-11-26 14:12:33,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3425540.0, ans=0.125 2023-11-26 14:12:44,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-26 14:12:45,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=22.5 2023-11-26 14:12:48,468 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8850, loss[loss=0.06072, simple_loss=0.07992, pruned_loss=0.01306, audio_tagging_loss=0.007696, over 13943.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09035, pruned_loss=0.01237, audio_tagging_loss=0.008874, over 3049578.56 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:13:00,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3425740.0, ans=0.125 2023-11-26 14:13:01,200 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:13:03,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2023-11-26 14:13:06,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3425740.0, ans=0.05 2023-11-26 14:13:16,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3425806.6666666665, ans=0.125 2023-11-26 14:13:21,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2023-11-26 14:13:24,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2023-11-26 14:13:40,750 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-26 14:13:42,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3425940.0, ans=0.0 2023-11-26 14:13:44,400 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8900, loss[loss=0.07548, simple_loss=0.1026, pruned_loss=0.01506, audio_tagging_loss=0.009119, over 15890.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09123, pruned_loss=0.01253, audio_tagging_loss=0.008714, over 3056549.42 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:13:52,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.670e+01 9.413e+01 1.057e+02 1.382e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 14:13:58,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3426073.3333333335, ans=0.1 2023-11-26 14:14:07,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-11-26 14:14:28,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-26 14:14:36,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-26 14:14:39,370 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8950, loss[loss=0.07809, simple_loss=0.1054, pruned_loss=0.01821, audio_tagging_loss=0.007192, over 14256.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09056, pruned_loss=0.01251, audio_tagging_loss=0.008664, over 3056317.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:14:43,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3426340.0, ans=0.025 2023-11-26 14:15:13,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-26 14:15:20,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3426540.0, ans=0.0 2023-11-26 14:15:22,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3426606.6666666665, ans=0.0 2023-11-26 14:15:26,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3426606.6666666665, ans=0.2 2023-11-26 14:15:30,702 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-26 14:15:34,073 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9000, loss[loss=0.08147, simple_loss=0.118, pruned_loss=0.01741, audio_tagging_loss=0.005078, over 15877.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09109, pruned_loss=0.01272, audio_tagging_loss=0.008588, over 3051844.25 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:15:34,073 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 14:16:06,626 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05882, simple_loss=0.0506, pruned_loss=0.005335, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-26 14:16:06,627 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 14:16:15,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.126e+01 8.986e+01 9.503e+01 1.043e+02 1.217e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:16:27,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3426806.6666666665, ans=0.125 2023-11-26 14:16:31,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.06 vs. limit=22.5 2023-11-26 14:16:36,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-26 14:16:40,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3426873.3333333335, ans=0.125 2023-11-26 14:16:46,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3426873.3333333335, ans=0.125 2023-11-26 14:16:58,888 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-26 14:17:00,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3426940.0, ans=0.0 2023-11-26 14:17:01,971 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9050, loss[loss=0.0608, simple_loss=0.08612, pruned_loss=0.01018, audio_tagging_loss=0.007555, over 14838.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09003, pruned_loss=0.01256, audio_tagging_loss=0.008612, over 3049912.97 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:17:09,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3427006.6666666665, ans=0.125 2023-11-26 14:17:20,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3427073.3333333335, ans=0.0 2023-11-26 14:17:54,112 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-26 14:17:54,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3427273.3333333335, ans=0.125 2023-11-26 14:17:57,789 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9100, loss[loss=0.07857, simple_loss=0.1102, pruned_loss=0.01798, audio_tagging_loss=0.005467, over 14910.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09099, pruned_loss=0.0126, audio_tagging_loss=0.008522, over 3047283.72 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:18:07,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.883e+01 9.542e+01 1.028e+02 1.451e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 14:18:16,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-26 14:18:18,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3427406.6666666665, ans=0.2 2023-11-26 14:18:22,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3427473.3333333335, ans=0.125 2023-11-26 14:18:26,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3427473.3333333335, ans=0.2 2023-11-26 14:18:29,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427473.3333333335, ans=0.1 2023-11-26 14:18:33,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3427540.0, ans=0.125 2023-11-26 14:18:51,492 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-26 14:18:55,204 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9150, loss[loss=0.06544, simple_loss=0.09011, pruned_loss=0.01393, audio_tagging_loss=0.00645, over 14129.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09024, pruned_loss=0.01242, audio_tagging_loss=0.008502, over 3046362.99 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:18:58,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-11-26 14:19:06,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-11-26 14:19:42,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3427940.0, ans=0.125 2023-11-26 14:19:43,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3427940.0, ans=0.09899494936611666 2023-11-26 14:19:46,764 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-26 14:19:50,129 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9200, loss[loss=0.08055, simple_loss=0.1175, pruned_loss=0.01641, audio_tagging_loss=0.005408, over 15054.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09052, pruned_loss=0.01242, audio_tagging_loss=0.008456, over 3046376.61 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:19:58,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.729e+01 9.387e+01 1.004e+02 1.309e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:20:07,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:20:08,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3428073.3333333335, ans=0.0 2023-11-26 14:20:15,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-11-26 14:20:29,934 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:20:37,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3428273.3333333335, ans=10.0 2023-11-26 14:20:42,630 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-26 14:20:45,751 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9250, loss[loss=0.06703, simple_loss=0.08555, pruned_loss=0.01684, audio_tagging_loss=0.007417, over 14734.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09003, pruned_loss=0.0124, audio_tagging_loss=0.008424, over 3052713.13 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:20:54,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3428340.0, ans=0.05 2023-11-26 14:21:38,506 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-26 14:21:42,682 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9300, loss[loss=0.08259, simple_loss=0.1188, pruned_loss=0.01752, audio_tagging_loss=0.005662, over 15393.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09052, pruned_loss=0.01239, audio_tagging_loss=0.008428, over 3055644.45 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:21:51,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.838e+01 9.271e+01 1.020e+02 1.264e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 14:21:56,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3428740.0, ans=0.125 2023-11-26 14:21:57,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3428740.0, ans=0.0 2023-11-26 14:22:06,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3428806.6666666665, ans=0.125 2023-11-26 14:22:33,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3428940.0, ans=0.125 2023-11-26 14:22:35,729 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-26 14:22:38,795 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9350, loss[loss=0.06243, simple_loss=0.08109, pruned_loss=0.01022, audio_tagging_loss=0.01167, over 15287.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09056, pruned_loss=0.01246, audio_tagging_loss=0.008534, over 3054353.29 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:22:40,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-26 14:22:47,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3429006.6666666665, ans=0.0 2023-11-26 14:22:58,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3429073.3333333335, ans=0.1 2023-11-26 14:22:58,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3429073.3333333335, ans=0.0 2023-11-26 14:23:10,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3429140.0, ans=0.95 2023-11-26 14:23:10,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3429140.0, ans=0.125 2023-11-26 14:23:30,918 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-26 14:23:34,347 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9400, loss[loss=0.07472, simple_loss=0.1062, pruned_loss=0.01398, audio_tagging_loss=0.007647, over 15762.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09056, pruned_loss=0.01239, audio_tagging_loss=0.008608, over 3054124.68 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:23:39,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-26 14:23:44,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.705e+01 9.718e+01 1.044e+02 1.326e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 14:23:57,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2023-11-26 14:24:20,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3429606.6666666665, ans=0.0 2023-11-26 14:24:27,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-26 14:24:29,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3429673.3333333335, ans=0.125 2023-11-26 14:24:30,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2023-11-26 14:24:30,864 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9450, loss[loss=0.07159, simple_loss=0.0951, pruned_loss=0.01499, audio_tagging_loss=0.009049, over 15213.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09047, pruned_loss=0.01235, audio_tagging_loss=0.008679, over 3053980.78 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:24:30,906 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:25:00,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3429806.6666666665, ans=0.035 2023-11-26 14:25:03,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3429873.3333333335, ans=0.2 2023-11-26 14:25:07,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3429873.3333333335, ans=0.0 2023-11-26 14:25:11,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=12.0 2023-11-26 14:25:22,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-26 14:25:23,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-26 14:25:27,580 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9500, loss[loss=0.07219, simple_loss=0.09468, pruned_loss=0.01416, audio_tagging_loss=0.01069, over 15412.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09021, pruned_loss=0.01223, audio_tagging_loss=0.008784, over 3055476.10 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:25:35,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3430006.6666666665, ans=0.0 2023-11-26 14:25:37,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.962e+01 9.623e+01 1.023e+02 1.293e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 14:26:19,727 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-26 14:26:22,799 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9550, loss[loss=0.06664, simple_loss=0.0911, pruned_loss=0.01025, audio_tagging_loss=0.01085, over 15438.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09054, pruned_loss=0.01213, audio_tagging_loss=0.008875, over 3052077.09 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:26:23,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3430340.0, ans=0.0 2023-11-26 14:26:25,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3430340.0, ans=0.1 2023-11-26 14:26:33,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3430406.6666666665, ans=0.0 2023-11-26 14:26:33,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3430406.6666666665, ans=0.0 2023-11-26 14:26:35,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-26 14:26:37,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3430406.6666666665, ans=0.07 2023-11-26 14:26:55,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3430473.3333333335, ans=0.0 2023-11-26 14:27:00,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3430540.0, ans=0.95 2023-11-26 14:27:15,632 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-26 14:27:18,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3430673.3333333335, ans=0.05 2023-11-26 14:27:18,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3430673.3333333335, ans=0.04949747468305833 2023-11-26 14:27:19,033 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9600, loss[loss=0.06647, simple_loss=0.08625, pruned_loss=0.01316, audio_tagging_loss=0.01018, over 15773.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09, pruned_loss=0.01209, audio_tagging_loss=0.008963, over 3051283.16 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:27:26,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-11-26 14:27:28,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2023-11-26 14:27:28,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2023-11-26 14:27:29,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.862e+01 9.478e+01 1.011e+02 2.091e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 14:27:32,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2023-11-26 14:27:57,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=10.0 2023-11-26 14:28:12,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-26 14:28:12,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3430940.0, ans=0.125 2023-11-26 14:28:15,762 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9650, loss[loss=0.04373, simple_loss=0.05702, pruned_loss=0.006481, audio_tagging_loss=0.008735, over 15912.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08986, pruned_loss=0.01205, audio_tagging_loss=0.008963, over 3048624.11 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:28:22,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3431006.6666666665, ans=0.125 2023-11-26 14:28:26,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3431073.3333333335, ans=0.0 2023-11-26 14:28:38,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3431140.0, ans=0.0 2023-11-26 14:29:04,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3431273.3333333335, ans=0.1 2023-11-26 14:29:08,409 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-26 14:29:11,523 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9700, loss[loss=0.04849, simple_loss=0.06569, pruned_loss=0.007719, audio_tagging_loss=0.007924, over 15048.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.0898, pruned_loss=0.01224, audio_tagging_loss=0.008856, over 3039355.16 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:29:21,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.878e+01 9.480e+01 1.018e+02 1.289e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 14:29:25,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3431406.6666666665, ans=0.125 2023-11-26 14:29:39,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3431473.3333333335, ans=0.1 2023-11-26 14:29:39,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3431473.3333333335, ans=0.2 2023-11-26 14:29:43,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-11-26 14:29:56,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=12.0 2023-11-26 14:29:59,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3431606.6666666665, ans=0.2 2023-11-26 14:30:04,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-26 14:30:07,847 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9750, loss[loss=0.06133, simple_loss=0.08288, pruned_loss=0.01177, audio_tagging_loss=0.008115, over 16284.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08944, pruned_loss=0.01224, audio_tagging_loss=0.008802, over 3040753.00 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:30:15,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3431673.3333333335, ans=0.125 2023-11-26 14:30:20,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3431740.0, ans=0.125 2023-11-26 14:30:31,813 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:30:43,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3431873.3333333335, ans=0.0 2023-11-26 14:30:51,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3431940.0, ans=0.125 2023-11-26 14:31:01,244 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-26 14:31:04,632 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9800, loss[loss=0.06861, simple_loss=0.09065, pruned_loss=0.01313, audio_tagging_loss=0.01017, over 16717.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08862, pruned_loss=0.01205, audio_tagging_loss=0.008757, over 3044760.57 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:31:06,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3432006.6666666665, ans=0.125 2023-11-26 14:31:13,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3432006.6666666665, ans=0.125 2023-11-26 14:31:14,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.979e+01 9.504e+01 1.025e+02 1.204e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:31:21,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3432073.3333333335, ans=0.125 2023-11-26 14:31:28,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-26 14:31:29,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3432140.0, ans=0.0 2023-11-26 14:31:50,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.79 vs. limit=10.0 2023-11-26 14:31:56,032 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:31:57,135 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-26 14:32:00,300 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9850, loss[loss=0.07069, simple_loss=0.09929, pruned_loss=0.01304, audio_tagging_loss=0.008001, over 15645.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08857, pruned_loss=0.01202, audio_tagging_loss=0.008765, over 3044823.74 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:32:02,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3432340.0, ans=0.125 2023-11-26 14:32:06,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3432340.0, ans=0.0 2023-11-26 14:32:06,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-26 14:32:22,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3432473.3333333335, ans=0.0 2023-11-26 14:32:24,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3432473.3333333335, ans=0.125 2023-11-26 14:32:29,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3432473.3333333335, ans=0.07 2023-11-26 14:32:32,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3432473.3333333335, ans=0.125 2023-11-26 14:32:32,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3432473.3333333335, ans=0.07 2023-11-26 14:32:32,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3432473.3333333335, ans=0.1 2023-11-26 14:32:52,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-26 14:32:53,704 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-26 14:32:56,776 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9900, loss[loss=0.04039, simple_loss=0.0492, pruned_loss=0.004708, audio_tagging_loss=0.01109, over 15287.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08816, pruned_loss=0.01186, audio_tagging_loss=0.00869, over 3039374.00 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:33:03,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3432673.3333333335, ans=0.125 2023-11-26 14:33:05,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3432673.3333333335, ans=0.2 2023-11-26 14:33:07,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.992e+01 8.539e+01 9.208e+01 1.007e+02 1.176e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 14:33:27,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3432806.6666666665, ans=0.125 2023-11-26 14:33:35,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-26 14:33:40,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-26 14:33:47,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2023-11-26 14:33:50,443 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-26 14:33:53,512 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9950, loss[loss=0.06874, simple_loss=0.08247, pruned_loss=0.01423, audio_tagging_loss=0.01328, over 15483.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08884, pruned_loss=0.01205, audio_tagging_loss=0.008672, over 3052229.11 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:34:21,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433140.0, ans=0.1 2023-11-26 14:34:25,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-26 14:34:29,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3433206.6666666665, ans=0.2 2023-11-26 14:34:31,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433206.6666666665, ans=0.1 2023-11-26 14:34:45,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-26 14:34:49,102 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10000, loss[loss=0.06971, simple_loss=0.09867, pruned_loss=0.01152, audio_tagging_loss=0.008853, over 15264.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08877, pruned_loss=0.01194, audio_tagging_loss=0.008634, over 3054480.86 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:34:59,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.778e+01 9.390e+01 1.020e+02 2.265e+02, threshold=1.878e+02, percent-clipped=1.0 2023-11-26 14:35:14,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3433473.3333333335, ans=0.125 2023-11-26 14:35:32,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3433606.6666666665, ans=0.1 2023-11-26 14:35:39,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433606.6666666665, ans=0.1 2023-11-26 14:35:41,077 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-26 14:35:45,415 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10050, loss[loss=0.05919, simple_loss=0.08201, pruned_loss=0.009759, audio_tagging_loss=0.008429, over 14528.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08944, pruned_loss=0.01188, audio_tagging_loss=0.008694, over 3053714.45 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:35:48,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3433673.3333333335, ans=0.125 2023-11-26 14:36:00,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3433740.0, ans=0.125 2023-11-26 14:36:27,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-26 14:36:37,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-26 14:36:41,171 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10100, loss[loss=0.07557, simple_loss=0.109, pruned_loss=0.01353, audio_tagging_loss=0.007559, over 15347.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08928, pruned_loss=0.01191, audio_tagging_loss=0.008716, over 3058556.08 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:36:43,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-26 14:36:44,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3434006.6666666665, ans=0.125 2023-11-26 14:36:51,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3434073.3333333335, ans=0.0 2023-11-26 14:36:51,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.526e+01 9.238e+01 9.912e+01 1.286e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 14:37:28,176 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:37:30,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3434273.3333333335, ans=0.125 2023-11-26 14:37:33,539 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-26 14:37:36,678 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10150, loss[loss=0.05476, simple_loss=0.07549, pruned_loss=0.007871, audio_tagging_loss=0.009143, over 15661.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08976, pruned_loss=0.0121, audio_tagging_loss=0.008716, over 3057459.80 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:37:47,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3434406.6666666665, ans=0.1 2023-11-26 14:37:49,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3434406.6666666665, ans=0.1 2023-11-26 14:37:51,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3434406.6666666665, ans=0.2 2023-11-26 14:37:52,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3434406.6666666665, ans=0.125 2023-11-26 14:37:59,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3434473.3333333335, ans=0.125 2023-11-26 14:38:05,428 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:38:20,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3434606.6666666665, ans=0.125 2023-11-26 14:38:28,844 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-26 14:38:28,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3434606.6666666665, ans=0.125 2023-11-26 14:38:32,199 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10200, loss[loss=0.09105, simple_loss=0.131, pruned_loss=0.01734, audio_tagging_loss=0.008207, over 16138.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0903, pruned_loss=0.01232, audio_tagging_loss=0.008847, over 3063429.78 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:38:34,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3434673.3333333335, ans=0.125 2023-11-26 14:38:44,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.939e+01 9.563e+01 1.037e+02 1.347e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 14:38:52,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-26 14:38:52,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3434740.0, ans=0.125 2023-11-26 14:38:55,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3434806.6666666665, ans=0.1 2023-11-26 14:38:55,893 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:39:06,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3434873.3333333335, ans=0.125 2023-11-26 14:39:19,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3434940.0, ans=0.0 2023-11-26 14:39:26,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-26 14:39:29,455 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10250, loss[loss=0.04297, simple_loss=0.04715, pruned_loss=0.009424, audio_tagging_loss=0.00997, over 14806.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08978, pruned_loss=0.01235, audio_tagging_loss=0.008917, over 3060714.90 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:39:45,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3435073.3333333335, ans=0.125 2023-11-26 14:39:48,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3435073.3333333335, ans=0.0 2023-11-26 14:40:22,410 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-26 14:40:25,496 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10300, loss[loss=0.06751, simple_loss=0.08894, pruned_loss=0.01259, audio_tagging_loss=0.01045, over 15204.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08997, pruned_loss=0.01242, audio_tagging_loss=0.008915, over 3060048.71 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:40:34,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3435340.0, ans=10.0 2023-11-26 14:40:36,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.928e+01 8.849e+01 9.518e+01 1.017e+02 1.480e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 14:40:37,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=22.5 2023-11-26 14:40:43,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-11-26 14:40:49,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3435473.3333333335, ans=0.125 2023-11-26 14:41:05,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=12.0 2023-11-26 14:41:06,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3435540.0, ans=0.125 2023-11-26 14:41:18,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-26 14:41:21,155 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10350, loss[loss=0.08194, simple_loss=0.1125, pruned_loss=0.01618, audio_tagging_loss=0.009485, over 15227.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09033, pruned_loss=0.01241, audio_tagging_loss=0.008964, over 3056740.65 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:41:26,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2023-11-26 14:41:38,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3435740.0, ans=10.0 2023-11-26 14:41:43,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3435806.6666666665, ans=0.2 2023-11-26 14:41:45,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3435806.6666666665, ans=0.2 2023-11-26 14:41:48,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3435806.6666666665, ans=0.125 2023-11-26 14:41:51,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3435806.6666666665, ans=0.0 2023-11-26 14:42:11,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3435940.0, ans=0.0 2023-11-26 14:42:13,255 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-26 14:42:17,226 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10400, loss[loss=0.05774, simple_loss=0.08461, pruned_loss=0.005898, audio_tagging_loss=0.009538, over 15921.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09002, pruned_loss=0.01232, audio_tagging_loss=0.009019, over 3056533.47 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:42:29,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.791e+01 9.464e+01 1.006e+02 1.363e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 14:42:48,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3436140.0, ans=0.125 2023-11-26 14:42:58,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3436206.6666666665, ans=0.1 2023-11-26 14:43:10,151 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-26 14:43:10,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3436273.3333333335, ans=0.125 2023-11-26 14:43:10,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3436273.3333333335, ans=0.125 2023-11-26 14:43:13,302 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10450, loss[loss=0.05957, simple_loss=0.0792, pruned_loss=0.0112, audio_tagging_loss=0.008761, over 14883.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08959, pruned_loss=0.01216, audio_tagging_loss=0.008985, over 3052533.38 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:43:24,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-26 14:43:33,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3436406.6666666665, ans=0.0 2023-11-26 14:43:34,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=22.5 2023-11-26 14:43:36,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-26 14:44:05,408 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-26 14:44:08,653 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10500, loss[loss=0.06455, simple_loss=0.09167, pruned_loss=0.01001, audio_tagging_loss=0.008697, over 15398.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09014, pruned_loss=0.0123, audio_tagging_loss=0.008741, over 3058766.35 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:44:20,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3436740.0, ans=0.125 2023-11-26 14:44:20,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.625e+01 9.527e+01 1.023e+02 1.211e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 14:44:23,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3436740.0, ans=0.125 2023-11-26 14:44:29,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3436740.0, ans=0.125 2023-11-26 14:44:39,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3436806.6666666665, ans=0.125 2023-11-26 14:44:46,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3436873.3333333335, ans=0.125 2023-11-26 14:45:01,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-26 14:45:02,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3436940.0, ans=0.0 2023-11-26 14:45:04,722 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10550, loss[loss=0.06345, simple_loss=0.09195, pruned_loss=0.009677, audio_tagging_loss=0.007799, over 15439.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09023, pruned_loss=0.01229, audio_tagging_loss=0.008736, over 3059302.50 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:45:22,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3437073.3333333335, ans=0.2 2023-11-26 14:45:58,768 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-26 14:46:02,169 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10600, loss[loss=0.06102, simple_loss=0.08375, pruned_loss=0.01021, audio_tagging_loss=0.008932, over 15343.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08981, pruned_loss=0.01213, audio_tagging_loss=0.008671, over 3053626.35 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:46:03,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2023-11-26 14:46:10,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-26 14:46:14,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.031e+01 9.613e+01 1.032e+02 1.237e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 14:46:17,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3437406.6666666665, ans=0.0 2023-11-26 14:46:29,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3437473.3333333335, ans=0.125 2023-11-26 14:46:43,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-11-26 14:46:54,450 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-26 14:46:57,553 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10650, loss[loss=0.05144, simple_loss=0.06511, pruned_loss=0.008929, audio_tagging_loss=0.009959, over 14546.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09113, pruned_loss=0.01232, audio_tagging_loss=0.00856, over 3054276.45 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:47:05,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3437673.3333333335, ans=0.125 2023-11-26 14:47:12,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3437740.0, ans=0.1 2023-11-26 14:47:43,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3437940.0, ans=0.0 2023-11-26 14:47:50,314 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-26 14:47:50,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437940.0, ans=0.1 2023-11-26 14:47:53,419 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10700, loss[loss=0.05957, simple_loss=0.08075, pruned_loss=0.01003, audio_tagging_loss=0.009168, over 14901.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09107, pruned_loss=0.01238, audio_tagging_loss=0.00853, over 3052271.94 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:48:07,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.817e+01 9.509e+01 1.036e+02 1.497e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 14:48:19,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3438140.0, ans=0.125 2023-11-26 14:48:24,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2023-11-26 14:48:30,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3438206.6666666665, ans=0.125 2023-11-26 14:48:46,728 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-26 14:48:49,892 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10750, loss[loss=0.04272, simple_loss=0.06311, pruned_loss=0.005106, audio_tagging_loss=0.006064, over 14828.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0904, pruned_loss=0.01226, audio_tagging_loss=0.008487, over 3051027.87 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:48:52,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3438340.0, ans=0.125 2023-11-26 14:49:12,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-11-26 14:49:23,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3438540.0, ans=0.5 2023-11-26 14:49:36,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3438606.6666666665, ans=0.125 2023-11-26 14:49:42,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-26 14:49:42,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-26 14:49:46,058 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10800, loss[loss=0.05274, simple_loss=0.06928, pruned_loss=0.007713, audio_tagging_loss=0.01038, over 14686.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08979, pruned_loss=0.01231, audio_tagging_loss=0.008531, over 3047203.60 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:49:55,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3438740.0, ans=0.2 2023-11-26 14:49:59,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.847e+01 9.608e+01 1.038e+02 1.531e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 14:50:03,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3438740.0, ans=0.0 2023-11-26 14:50:24,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3438873.3333333335, ans=0.125 2023-11-26 14:50:38,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-26 14:50:40,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3438940.0, ans=0.1 2023-11-26 14:50:42,498 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10850, loss[loss=0.06246, simple_loss=0.08729, pruned_loss=0.01141, audio_tagging_loss=0.007413, over 14920.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08919, pruned_loss=0.01223, audio_tagging_loss=0.008537, over 3042101.77 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:50:55,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-26 14:51:01,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3439073.3333333335, ans=0.0 2023-11-26 14:51:35,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-26 14:51:36,713 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:51:38,898 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10900, loss[loss=0.04225, simple_loss=0.06178, pruned_loss=0.002732, audio_tagging_loss=0.008633, over 13993.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08851, pruned_loss=0.01212, audio_tagging_loss=0.008681, over 3041763.90 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:51:47,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3439340.0, ans=0.125 2023-11-26 14:51:52,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.005e+01 9.638e+01 1.044e+02 1.421e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 14:52:04,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3439473.3333333335, ans=0.125 2023-11-26 14:52:07,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3439473.3333333335, ans=0.0 2023-11-26 14:52:12,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3439540.0, ans=0.125 2023-11-26 14:52:24,999 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:52:28,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3439606.6666666665, ans=0.125 2023-11-26 14:52:28,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439606.6666666665, ans=0.1 2023-11-26 14:52:29,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3439606.6666666665, ans=0.0 2023-11-26 14:52:31,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-26 14:52:34,983 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10950, loss[loss=0.08408, simple_loss=0.1285, pruned_loss=0.01556, audio_tagging_loss=0.004269, over 15511.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08894, pruned_loss=0.01207, audio_tagging_loss=0.008596, over 3046814.59 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:52:39,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3439673.3333333335, ans=10.0 2023-11-26 14:52:59,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2023-11-26 14:53:10,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3439873.3333333335, ans=0.125 2023-11-26 14:53:20,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3439940.0, ans=0.125 2023-11-26 14:53:27,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-26 14:53:32,851 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11000, loss[loss=0.05425, simple_loss=0.073, pruned_loss=0.007701, audio_tagging_loss=0.01005, over 14611.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08964, pruned_loss=0.0121, audio_tagging_loss=0.008614, over 3044262.55 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:53:36,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3440006.6666666665, ans=0.125 2023-11-26 14:53:43,509 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:53:47,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.791e+01 9.278e+01 1.005e+02 1.404e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 14:54:13,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-26 14:54:25,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-26 14:54:25,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3440273.3333333335, ans=0.05 2023-11-26 14:54:29,438 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11050, loss[loss=0.05287, simple_loss=0.07184, pruned_loss=0.009058, audio_tagging_loss=0.00789, over 16313.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0895, pruned_loss=0.01214, audio_tagging_loss=0.00866, over 3046014.72 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:54:31,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3440340.0, ans=0.1 2023-11-26 14:54:33,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3440340.0, ans=0.125 2023-11-26 14:54:58,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3440473.3333333335, ans=0.0 2023-11-26 14:55:17,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3440606.6666666665, ans=0.0 2023-11-26 14:55:21,598 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-26 14:55:24,785 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11100, loss[loss=0.06293, simple_loss=0.08354, pruned_loss=0.01047, audio_tagging_loss=0.01069, over 15178.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08915, pruned_loss=0.01202, audio_tagging_loss=0.00872, over 3047037.65 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:55:38,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.786e+01 9.291e+01 1.014e+02 1.274e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 14:56:15,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3440940.0, ans=0.0 2023-11-26 14:56:17,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-26 14:56:20,727 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11150, loss[loss=0.0818, simple_loss=0.1218, pruned_loss=0.01393, audio_tagging_loss=0.006957, over 15670.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08867, pruned_loss=0.0119, audio_tagging_loss=0.008917, over 3045071.17 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:56:38,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2023-11-26 14:56:46,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3441140.0, ans=0.0 2023-11-26 14:56:53,708 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:56:56,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3441206.6666666665, ans=0.125 2023-11-26 14:57:13,684 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-26 14:57:18,228 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11200, loss[loss=0.06002, simple_loss=0.08076, pruned_loss=0.01151, audio_tagging_loss=0.008125, over 15042.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.0891, pruned_loss=0.01199, audio_tagging_loss=0.00899, over 3039365.34 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:57:19,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=22.5 2023-11-26 14:57:30,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.740e+01 9.384e+01 1.028e+02 1.331e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:57:38,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3441473.3333333335, ans=0.125 2023-11-26 14:57:59,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441540.0, ans=0.1 2023-11-26 14:58:06,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3441606.6666666665, ans=0.2 2023-11-26 14:58:06,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.85 vs. limit=10.0 2023-11-26 14:58:10,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-26 14:58:12,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3441673.3333333335, ans=0.0 2023-11-26 14:58:13,522 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11250, loss[loss=0.07341, simple_loss=0.1, pruned_loss=0.01449, audio_tagging_loss=0.008901, over 14772.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.0891, pruned_loss=0.01205, audio_tagging_loss=0.008861, over 3043158.11 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:58:15,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3441673.3333333335, ans=0.125 2023-11-26 14:58:19,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3441673.3333333335, ans=0.125 2023-11-26 14:58:21,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3441673.3333333335, ans=0.07 2023-11-26 14:58:24,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3441740.0, ans=0.05 2023-11-26 14:58:41,101 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:58:45,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441806.6666666665, ans=0.1 2023-11-26 14:58:46,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3441873.3333333335, ans=0.2 2023-11-26 14:58:59,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441940.0, ans=0.1 2023-11-26 14:59:05,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-26 14:59:09,214 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11300, loss[loss=0.07095, simple_loss=0.1028, pruned_loss=0.01392, audio_tagging_loss=0.00565, over 15528.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08884, pruned_loss=0.01207, audio_tagging_loss=0.00866, over 3038500.76 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:59:09,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3442006.6666666665, ans=0.04949747468305833 2023-11-26 14:59:20,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.80 vs. limit=22.5 2023-11-26 14:59:23,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-11-26 14:59:24,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.684e+01 9.357e+01 1.017e+02 1.284e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 14:59:29,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3442073.3333333335, ans=0.2 2023-11-26 14:59:50,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3442206.6666666665, ans=0.125 2023-11-26 14:59:53,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3442273.3333333335, ans=0.0 2023-11-26 15:00:02,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-26 15:00:05,784 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11350, loss[loss=0.06732, simple_loss=0.1011, pruned_loss=0.007872, audio_tagging_loss=0.008878, over 14693.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08891, pruned_loss=0.01207, audio_tagging_loss=0.008646, over 3042364.05 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:00:10,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2023-11-26 15:00:15,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3442340.0, ans=0.0 2023-11-26 15:00:50,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-26 15:00:55,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3442606.6666666665, ans=0.1 2023-11-26 15:00:55,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.32 vs. limit=22.5 2023-11-26 15:00:58,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-26 15:01:01,798 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11400, loss[loss=0.06278, simple_loss=0.08236, pruned_loss=0.008951, audio_tagging_loss=0.01265, over 14938.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08854, pruned_loss=0.01206, audio_tagging_loss=0.008611, over 3045549.05 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:01:15,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.777e+01 9.213e+01 1.005e+02 1.277e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 15:01:24,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3442806.6666666665, ans=0.2 2023-11-26 15:01:37,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3442873.3333333335, ans=0.1 2023-11-26 15:01:40,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3442873.3333333335, ans=0.1 2023-11-26 15:01:49,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3442940.0, ans=0.0 2023-11-26 15:01:53,937 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-26 15:01:57,086 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11450, loss[loss=0.04762, simple_loss=0.05937, pruned_loss=0.00721, audio_tagging_loss=0.01072, over 15212.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08819, pruned_loss=0.01204, audio_tagging_loss=0.00865, over 3048684.63 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:02:05,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2023-11-26 15:02:19,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3443140.0, ans=0.125 2023-11-26 15:02:44,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3443273.3333333335, ans=0.125 2023-11-26 15:02:49,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-26 15:02:53,542 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11500, loss[loss=0.0622, simple_loss=0.08329, pruned_loss=0.008031, audio_tagging_loss=0.01253, over 15163.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08926, pruned_loss=0.0122, audio_tagging_loss=0.008573, over 3043770.98 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:03:08,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.894e+01 9.338e+01 1.016e+02 1.234e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:03:23,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3443473.3333333335, ans=0.125 2023-11-26 15:03:23,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-11-26 15:03:36,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3443540.0, ans=0.2 2023-11-26 15:03:43,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3443606.6666666665, ans=0.0 2023-11-26 15:03:46,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-26 15:03:49,910 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11550, loss[loss=0.06087, simple_loss=0.08152, pruned_loss=0.009408, audio_tagging_loss=0.0107, over 15074.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08968, pruned_loss=0.01229, audio_tagging_loss=0.00857, over 3045997.14 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:03:50,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3443673.3333333335, ans=0.1 2023-11-26 15:03:55,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3443673.3333333335, ans=0.0 2023-11-26 15:04:10,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-26 15:04:25,139 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:04:41,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3443940.0, ans=0.2 2023-11-26 15:04:42,169 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-26 15:04:45,627 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11600, loss[loss=0.07737, simple_loss=0.1101, pruned_loss=0.01686, audio_tagging_loss=0.005485, over 16181.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08912, pruned_loss=0.01215, audio_tagging_loss=0.00866, over 3050824.77 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:04:52,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3444006.6666666665, ans=0.5 2023-11-26 15:04:59,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.01 vs. limit=10.0 2023-11-26 15:05:00,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.905e+01 9.507e+01 1.006e+02 1.398e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 15:05:19,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3444206.6666666665, ans=0.125 2023-11-26 15:05:28,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3444206.6666666665, ans=0.125 2023-11-26 15:05:30,597 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:05:32,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3444273.3333333335, ans=0.1 2023-11-26 15:05:38,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-26 15:05:41,436 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11650, loss[loss=0.05793, simple_loss=0.08171, pruned_loss=0.008437, audio_tagging_loss=0.00864, over 15485.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09058, pruned_loss=0.01229, audio_tagging_loss=0.00863, over 3051538.73 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:05:57,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3444406.6666666665, ans=0.125 2023-11-26 15:06:18,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3444540.0, ans=0.125 2023-11-26 15:06:34,837 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-26 15:06:37,973 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11700, loss[loss=0.04931, simple_loss=0.05942, pruned_loss=0.00834, audio_tagging_loss=0.01126, over 15413.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08949, pruned_loss=0.01211, audio_tagging_loss=0.008731, over 3045918.48 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:06:38,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3444673.3333333335, ans=0.0 2023-11-26 15:06:40,250 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:06:52,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 8.754e+01 9.292e+01 9.879e+01 2.063e+02, threshold=1.858e+02, percent-clipped=1.0 2023-11-26 15:07:29,502 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-26 15:07:32,642 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11750, loss[loss=0.06064, simple_loss=0.08005, pruned_loss=0.01065, audio_tagging_loss=0.009964, over 15163.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08909, pruned_loss=0.01207, audio_tagging_loss=0.008737, over 3040122.68 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:07:47,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-26 15:07:47,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-26 15:07:50,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3445073.3333333335, ans=0.5 2023-11-26 15:08:05,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3445140.0, ans=0.125 2023-11-26 15:08:15,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=22.5 2023-11-26 15:08:15,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3445206.6666666665, ans=0.2 2023-11-26 15:08:22,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-11-26 15:08:23,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-26 15:08:24,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3445273.3333333335, ans=0.1 2023-11-26 15:08:24,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-26 15:08:28,239 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11800, loss[loss=0.06813, simple_loss=0.09543, pruned_loss=0.01255, audio_tagging_loss=0.007857, over 15094.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08962, pruned_loss=0.01215, audio_tagging_loss=0.008768, over 3039995.11 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:08:40,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3445406.6666666665, ans=0.07 2023-11-26 15:08:43,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3445406.6666666665, ans=0.125 2023-11-26 15:08:45,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.777e+01 9.316e+01 1.001e+02 1.366e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 15:08:52,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3445473.3333333335, ans=0.125 2023-11-26 15:08:57,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3445473.3333333335, ans=0.125 2023-11-26 15:09:12,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.00 vs. limit=6.0 2023-11-26 15:09:21,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3445606.6666666665, ans=0.2 2023-11-26 15:09:22,277 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-26 15:09:25,434 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11850, loss[loss=0.04714, simple_loss=0.05948, pruned_loss=0.007493, audio_tagging_loss=0.009905, over 16344.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09006, pruned_loss=0.01222, audio_tagging_loss=0.008807, over 3039559.67 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:09:41,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3445740.0, ans=0.125 2023-11-26 15:09:48,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3445806.6666666665, ans=0.125 2023-11-26 15:09:58,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3445873.3333333335, ans=0.0 2023-11-26 15:09:58,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3445873.3333333335, ans=0.2 2023-11-26 15:10:14,967 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:10:15,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2023-11-26 15:10:17,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-26 15:10:17,986 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-26 15:10:21,066 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11900, loss[loss=0.08276, simple_loss=0.1141, pruned_loss=0.01652, audio_tagging_loss=0.009178, over 15327.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08968, pruned_loss=0.01215, audio_tagging_loss=0.008942, over 3035509.00 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:10:35,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2023-11-26 15:10:35,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.570e+01 9.365e+01 9.968e+01 1.257e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 15:10:51,430 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:10:55,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3446206.6666666665, ans=0.0 2023-11-26 15:11:04,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3446273.3333333335, ans=0.125 2023-11-26 15:11:13,052 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-26 15:11:13,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3446273.3333333335, ans=0.125 2023-11-26 15:11:16,147 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11950, loss[loss=0.064, simple_loss=0.08655, pruned_loss=0.007452, audio_tagging_loss=0.01328, over 15293.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08826, pruned_loss=0.012, audio_tagging_loss=0.00915, over 3034583.31 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:11:38,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3446473.3333333335, ans=0.125 2023-11-26 15:11:48,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3446473.3333333335, ans=0.125 2023-11-26 15:11:57,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3446540.0, ans=0.0 2023-11-26 15:12:04,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3446606.6666666665, ans=0.0 2023-11-26 15:12:07,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-26 15:12:11,025 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 12000, loss[loss=0.06777, simple_loss=0.09629, pruned_loss=0.01264, audio_tagging_loss=0.006977, over 15526.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08899, pruned_loss=0.01214, audio_tagging_loss=0.009216, over 3038632.07 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:12:11,026 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 15:12:29,836 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2509, 3.0001, 3.2892, 3.0041, 3.7168, 3.7182, 3.2261, 3.1688], device='cuda:1') 2023-11-26 15:12:43,903 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05829, simple_loss=0.05056, pruned_loss=0.00528, audio_tagging_loss=0.02773, over 4681554.00 frames. 2023-11-26 15:12:43,904 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 15:12:45,218 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:12:52,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-26 15:12:58,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 8.906e+01 9.562e+01 1.016e+02 1.213e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 15:13:02,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-11-26 15:13:38,092 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 0, loss[loss=0.06962, simple_loss=0.06833, pruned_loss=0.01052, audio_tagging_loss=0.02494, over 15540.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.06833, pruned_loss=0.01052, audio_tagging_loss=0.02494, over 15540.00 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:13:38,092 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 15:14:00,557 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3073, 4.2773, 4.4607, 4.4578], device='cuda:1') 2023-11-26 15:14:09,397 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05821, simple_loss=0.05063, pruned_loss=0.005319, audio_tagging_loss=0.02758, over 4681554.00 frames. 2023-11-26 15:14:09,398 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 15:14:14,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3446840.0, ans=0.1 2023-11-26 15:14:28,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3446906.6666666665, ans=0.125 2023-11-26 15:14:34,459 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-26 15:14:34,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3446973.3333333335, ans=0.0 2023-11-26 15:14:51,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3447040.0, ans=15.0 2023-11-26 15:14:54,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3447106.6666666665, ans=0.2 2023-11-26 15:14:57,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3447106.6666666665, ans=0.125 2023-11-26 15:14:58,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3447106.6666666665, ans=0.1 2023-11-26 15:15:05,080 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 50, loss[loss=0.07465, simple_loss=0.08534, pruned_loss=0.01299, audio_tagging_loss=0.01899, over 14428.00 frames. ], tot_loss[loss=0.07391, simple_loss=0.0887, pruned_loss=0.0119, audio_tagging_loss=0.01766, over 694713.51 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:15:05,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3447173.3333333335, ans=0.125 2023-11-26 15:15:16,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-26 15:15:30,265 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-26 15:15:34,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447306.6666666665, ans=0.1 2023-11-26 15:15:39,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447373.3333333335, ans=0.1 2023-11-26 15:15:47,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3447373.3333333335, ans=0.0 2023-11-26 15:15:48,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.397e+01 9.647e+01 1.037e+02 1.149e+02 1.439e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-26 15:15:57,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3447440.0, ans=0.125 2023-11-26 15:16:01,679 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 100, loss[loss=0.08124, simple_loss=0.1078, pruned_loss=0.01333, audio_tagging_loss=0.01402, over 15868.00 frames. ], tot_loss[loss=0.07268, simple_loss=0.08787, pruned_loss=0.0119, audio_tagging_loss=0.01685, over 1213634.96 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:16:04,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3447506.6666666665, ans=0.125 2023-11-26 15:16:12,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447573.3333333335, ans=0.1 2023-11-26 15:16:20,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-26 15:16:26,511 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-26 15:16:31,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447640.0, ans=0.1 2023-11-26 15:16:35,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3447706.6666666665, ans=0.125 2023-11-26 15:16:38,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3447706.6666666665, ans=0.0 2023-11-26 15:16:42,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=15.0 2023-11-26 15:16:58,403 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 150, loss[loss=0.09914, simple_loss=0.1425, pruned_loss=0.02206, audio_tagging_loss=0.005814, over 17131.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.08876, pruned_loss=0.01202, audio_tagging_loss=0.01478, over 1626621.95 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:17:03,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 15:17:09,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-26 15:17:10,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-26 15:17:23,552 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-26 15:17:27,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=22.5 2023-11-26 15:17:42,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3448106.6666666665, ans=0.125 2023-11-26 15:17:43,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.130e+01 9.675e+01 1.049e+02 1.216e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 15:17:47,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3448106.6666666665, ans=0.125 2023-11-26 15:17:49,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-26 15:17:54,486 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 200, loss[loss=0.07193, simple_loss=0.09805, pruned_loss=0.01592, audio_tagging_loss=0.006982, over 15809.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.0887, pruned_loss=0.01194, audio_tagging_loss=0.01298, over 1942213.42 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:17:59,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3448173.3333333335, ans=0.015 2023-11-26 15:18:19,042 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-26 15:18:26,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448306.6666666665, ans=0.1 2023-11-26 15:18:26,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=15.0 2023-11-26 15:18:36,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3448373.3333333335, ans=0.125 2023-11-26 15:18:51,355 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 250, loss[loss=0.06683, simple_loss=0.0951, pruned_loss=0.01135, audio_tagging_loss=0.00793, over 14959.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.08948, pruned_loss=0.01207, audio_tagging_loss=0.01169, over 2185999.78 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:18:59,053 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:19:08,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-26 15:19:15,425 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-26 15:19:19,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3448640.0, ans=0.02 2023-11-26 15:19:36,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.917e+01 9.750e+01 1.047e+02 1.492e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 15:19:46,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448840.0, ans=0.1 2023-11-26 15:19:46,944 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 300, loss[loss=0.06415, simple_loss=0.08961, pruned_loss=0.008428, audio_tagging_loss=0.01091, over 15141.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09101, pruned_loss=0.01224, audio_tagging_loss=0.01074, over 2377629.88 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:19:48,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3448840.0, ans=0.125 2023-11-26 15:19:53,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-26 15:20:00,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3448906.6666666665, ans=0.125 2023-11-26 15:20:05,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2023-11-26 15:20:06,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2023-11-26 15:20:12,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-26 15:20:16,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3448973.3333333335, ans=0.0 2023-11-26 15:20:33,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-26 15:20:40,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-26 15:20:43,514 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 350, loss[loss=0.05693, simple_loss=0.07177, pruned_loss=0.009308, audio_tagging_loss=0.01174, over 16170.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.0913, pruned_loss=0.01221, audio_tagging_loss=0.01027, over 2534294.09 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:20:43,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3449173.3333333335, ans=0.125 2023-11-26 15:20:45,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3449173.3333333335, ans=0.125 2023-11-26 15:20:49,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3449173.3333333335, ans=0.0 2023-11-26 15:20:52,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3449173.3333333335, ans=0.125 2023-11-26 15:20:54,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3449240.0, ans=0.125 2023-11-26 15:20:58,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3449240.0, ans=0.125 2023-11-26 15:21:07,968 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-26 15:21:23,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-26 15:21:25,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2023-11-26 15:21:26,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3449373.3333333335, ans=0.125 2023-11-26 15:21:28,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.037e+01 9.510e+01 1.047e+02 1.188e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 15:21:28,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3449440.0, ans=0.125 2023-11-26 15:21:33,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-26 15:21:40,192 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 400, loss[loss=0.06224, simple_loss=0.08428, pruned_loss=0.0103, audio_tagging_loss=0.009799, over 15941.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09048, pruned_loss=0.0122, audio_tagging_loss=0.00994, over 2648155.39 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:22:03,999 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-26 15:22:25,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3449773.3333333335, ans=0.015 2023-11-26 15:22:28,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3449773.3333333335, ans=0.2 2023-11-26 15:22:29,268 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:22:33,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3449773.3333333335, ans=0.125 2023-11-26 15:22:35,252 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 450, loss[loss=0.07835, simple_loss=0.1043, pruned_loss=0.01638, audio_tagging_loss=0.009815, over 14803.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09082, pruned_loss=0.01234, audio_tagging_loss=0.009614, over 2740886.29 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:22:45,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3449906.6666666665, ans=0.125 2023-11-26 15:22:56,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2023-11-26 15:22:58,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3449973.3333333335, ans=0.0 2023-11-26 15:23:00,152 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-26 15:23:20,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.956e+01 9.480e+01 1.000e+02 1.239e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 15:23:21,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=22.5 2023-11-26 15:23:31,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3450173.3333333335, ans=0.0 2023-11-26 15:23:31,849 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 500, loss[loss=0.06962, simple_loss=0.08891, pruned_loss=0.01649, audio_tagging_loss=0.008674, over 15410.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09037, pruned_loss=0.0123, audio_tagging_loss=0.009263, over 2807741.68 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:23:34,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3450173.3333333335, ans=0.125 2023-11-26 15:23:41,143 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:23:49,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3450240.0, ans=0.025 2023-11-26 15:23:57,012 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-26 15:24:05,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3450373.3333333335, ans=0.2 2023-11-26 15:24:07,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3450373.3333333335, ans=0.125 2023-11-26 15:24:07,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3450373.3333333335, ans=0.07 2023-11-26 15:24:11,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-11-26 15:24:18,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=22.5 2023-11-26 15:24:28,926 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 550, loss[loss=0.0613, simple_loss=0.08954, pruned_loss=0.008918, audio_tagging_loss=0.007609, over 15299.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08916, pruned_loss=0.01211, audio_tagging_loss=0.009098, over 2861934.05 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:24:29,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3450506.6666666665, ans=0.125 2023-11-26 15:24:40,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.90 vs. limit=10.0 2023-11-26 15:24:52,337 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-26 15:25:02,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2023-11-26 15:25:11,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3450706.6666666665, ans=0.125 2023-11-26 15:25:14,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.780e+01 9.554e+01 1.038e+02 1.321e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 15:25:14,816 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:25:24,058 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 600, loss[loss=0.06894, simple_loss=0.09071, pruned_loss=0.01465, audio_tagging_loss=0.008928, over 15067.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08922, pruned_loss=0.0122, audio_tagging_loss=0.00913, over 2900750.80 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:25:48,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-26 15:26:14,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3451106.6666666665, ans=0.05 2023-11-26 15:26:15,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-26 15:26:19,361 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 650, loss[loss=0.09038, simple_loss=0.1293, pruned_loss=0.01843, audio_tagging_loss=0.007292, over 15761.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08954, pruned_loss=0.01237, audio_tagging_loss=0.009059, over 2934565.39 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:26:27,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3451173.3333333335, ans=0.2 2023-11-26 15:26:31,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3451240.0, ans=0.125 2023-11-26 15:26:32,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3451240.0, ans=0.1 2023-11-26 15:26:40,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-26 15:26:45,084 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-26 15:26:58,111 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:27:06,894 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.698e+01 9.351e+01 9.946e+01 1.223e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 15:27:15,789 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 700, loss[loss=0.08518, simple_loss=0.1249, pruned_loss=0.01681, audio_tagging_loss=0.005947, over 15883.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08931, pruned_loss=0.01213, audio_tagging_loss=0.009007, over 2960690.02 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:27:24,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3451506.6666666665, ans=0.125 2023-11-26 15:27:26,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3451573.3333333335, ans=0.1 2023-11-26 15:27:40,477 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-26 15:27:43,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=12.0 2023-11-26 15:27:43,973 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:27:45,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3451640.0, ans=0.0 2023-11-26 15:28:12,469 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 750, loss[loss=0.08756, simple_loss=0.1207, pruned_loss=0.01943, audio_tagging_loss=0.007776, over 15487.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0903, pruned_loss=0.01229, audio_tagging_loss=0.008926, over 2978353.42 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:28:27,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-11-26 15:28:35,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3451973.3333333335, ans=0.125 2023-11-26 15:28:36,638 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-26 15:28:41,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-26 15:28:43,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451973.3333333335, ans=0.1 2023-11-26 15:28:59,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.944e+01 9.681e+01 1.076e+02 1.736e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 15:29:04,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3452106.6666666665, ans=0.1 2023-11-26 15:29:05,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3452106.6666666665, ans=0.125 2023-11-26 15:29:08,442 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 800, loss[loss=0.07402, simple_loss=0.09978, pruned_loss=0.01443, audio_tagging_loss=0.009704, over 15946.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.0904, pruned_loss=0.01236, audio_tagging_loss=0.008957, over 2995208.40 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:29:12,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3452173.3333333335, ans=0.0 2023-11-26 15:29:15,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3452173.3333333335, ans=0.0 2023-11-26 15:29:15,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2023-11-26 15:29:16,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3452173.3333333335, ans=0.125 2023-11-26 15:29:18,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2023-11-26 15:29:19,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3452240.0, ans=0.0 2023-11-26 15:29:34,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-26 15:30:02,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.31 vs. limit=22.5 2023-11-26 15:30:03,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3452506.6666666665, ans=0.2 2023-11-26 15:30:03,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3452506.6666666665, ans=0.0 2023-11-26 15:30:04,063 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 850, loss[loss=0.03585, simple_loss=0.03652, pruned_loss=0.003972, audio_tagging_loss=0.01361, over 14404.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08987, pruned_loss=0.01223, audio_tagging_loss=0.009046, over 3009353.15 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:30:07,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.569e-03 2023-11-26 15:30:29,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-26 15:30:33,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3452640.0, ans=0.2 2023-11-26 15:30:36,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2023-11-26 15:30:48,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2023-11-26 15:30:52,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.827e+01 9.589e+01 1.017e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 15:31:00,604 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 900, loss[loss=0.0545, simple_loss=0.07242, pruned_loss=0.01059, audio_tagging_loss=0.007702, over 13830.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0902, pruned_loss=0.01223, audio_tagging_loss=0.009058, over 3014582.24 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:31:09,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-26 15:31:16,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3452906.6666666665, ans=0.125 2023-11-26 15:31:24,581 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-26 15:31:29,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3452973.3333333335, ans=0.1 2023-11-26 15:31:41,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3453040.0, ans=10.0 2023-11-26 15:31:43,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3453106.6666666665, ans=0.0 2023-11-26 15:31:50,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=12.0 2023-11-26 15:31:54,758 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 950, loss[loss=0.04912, simple_loss=0.06814, pruned_loss=0.00577, audio_tagging_loss=0.009276, over 16488.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09026, pruned_loss=0.01215, audio_tagging_loss=0.008982, over 3022119.09 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:31:55,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453173.3333333335, ans=0.1 2023-11-26 15:32:00,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3453173.3333333335, ans=0.125 2023-11-26 15:32:07,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3453240.0, ans=0.07 2023-11-26 15:32:15,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-26 15:32:20,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-26 15:32:30,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3453373.3333333335, ans=0.125 2023-11-26 15:32:38,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3453440.0, ans=0.125 2023-11-26 15:32:41,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.765e+01 9.471e+01 9.957e+01 1.208e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 15:32:42,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3453440.0, ans=0.125 2023-11-26 15:32:50,867 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1000, loss[loss=0.08324, simple_loss=0.1182, pruned_loss=0.01674, audio_tagging_loss=0.007414, over 16038.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.0899, pruned_loss=0.01214, audio_tagging_loss=0.008929, over 3026544.07 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:32:57,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3453506.6666666665, ans=0.125 2023-11-26 15:33:12,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3453640.0, ans=0.1 2023-11-26 15:33:14,244 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:33:15,314 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-26 15:33:19,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3453640.0, ans=0.0 2023-11-26 15:33:33,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3453706.6666666665, ans=0.125 2023-11-26 15:33:34,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-26 15:33:35,426 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:33:37,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3453773.3333333335, ans=0.015 2023-11-26 15:33:41,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-26 15:33:46,943 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1050, loss[loss=0.07503, simple_loss=0.0998, pruned_loss=0.01526, audio_tagging_loss=0.009871, over 15603.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09003, pruned_loss=0.01218, audio_tagging_loss=0.008786, over 3036159.10 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:33:50,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3453840.0, ans=0.1 2023-11-26 15:33:51,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2023-11-26 15:33:58,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3453906.6666666665, ans=0.0 2023-11-26 15:34:04,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3453906.6666666665, ans=0.0 2023-11-26 15:34:11,138 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-26 15:34:21,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-26 15:34:29,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-26 15:34:33,433 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.861e+01 9.465e+01 1.011e+02 1.415e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 15:34:42,001 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1100, loss[loss=0.07, simple_loss=0.08934, pruned_loss=0.01484, audio_tagging_loss=0.0105, over 15631.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0898, pruned_loss=0.01207, audio_tagging_loss=0.008668, over 3051820.97 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:34:45,232 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:34:49,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3454173.3333333335, ans=0.1 2023-11-26 15:34:52,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3454240.0, ans=0.0 2023-11-26 15:35:00,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3454240.0, ans=0.0 2023-11-26 15:35:02,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3454306.6666666665, ans=0.125 2023-11-26 15:35:06,600 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-26 15:35:13,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3454306.6666666665, ans=0.0 2023-11-26 15:35:29,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3454440.0, ans=0.125 2023-11-26 15:35:31,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3454440.0, ans=0.035 2023-11-26 15:35:37,336 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1150, loss[loss=0.05055, simple_loss=0.06627, pruned_loss=0.009377, audio_tagging_loss=0.008032, over 14865.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08918, pruned_loss=0.01191, audio_tagging_loss=0.008702, over 3047005.05 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:35:47,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3454573.3333333335, ans=0.2 2023-11-26 15:36:01,836 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-26 15:36:03,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3454640.0, ans=0.1 2023-11-26 15:36:15,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3454706.6666666665, ans=0.1 2023-11-26 15:36:24,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.766e+01 9.295e+01 9.999e+01 1.209e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 15:36:32,878 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1200, loss[loss=0.07293, simple_loss=0.1021, pruned_loss=0.01375, audio_tagging_loss=0.008139, over 15405.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08988, pruned_loss=0.012, audio_tagging_loss=0.008577, over 3045267.45 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:36:55,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2023-11-26 15:36:57,044 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-26 15:36:58,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3454973.3333333335, ans=0.125 2023-11-26 15:37:09,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3455040.0, ans=0.05 2023-11-26 15:37:23,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-26 15:37:28,277 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1250, loss[loss=0.06889, simple_loss=0.0879, pruned_loss=0.01611, audio_tagging_loss=0.00883, over 16294.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08975, pruned_loss=0.01221, audio_tagging_loss=0.008614, over 3043112.50 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:37:30,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-11-26 15:37:52,852 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-26 15:38:11,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3455440.0, ans=0.125 2023-11-26 15:38:13,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3455440.0, ans=0.2 2023-11-26 15:38:16,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.858e+01 9.436e+01 1.015e+02 1.276e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 15:38:23,784 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1300, loss[loss=0.0644, simple_loss=0.08748, pruned_loss=0.01275, audio_tagging_loss=0.007908, over 16420.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08954, pruned_loss=0.01216, audio_tagging_loss=0.008512, over 3041648.69 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:38:25,058 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:38:35,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3455573.3333333335, ans=0.125 2023-11-26 15:38:35,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3455573.3333333335, ans=0.0 2023-11-26 15:38:37,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3455573.3333333335, ans=0.0 2023-11-26 15:38:38,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3455573.3333333335, ans=0.0 2023-11-26 15:38:38,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3455573.3333333335, ans=0.0 2023-11-26 15:38:41,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3455573.3333333335, ans=0.0 2023-11-26 15:38:48,244 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-26 15:39:00,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3455706.6666666665, ans=0.125 2023-11-26 15:39:02,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3455706.6666666665, ans=0.1 2023-11-26 15:39:19,313 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1350, loss[loss=0.0554, simple_loss=0.07511, pruned_loss=0.009415, audio_tagging_loss=0.008428, over 14756.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08969, pruned_loss=0.01221, audio_tagging_loss=0.008582, over 3047021.33 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:39:35,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3455906.6666666665, ans=0.125 2023-11-26 15:39:40,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3455973.3333333335, ans=0.125 2023-11-26 15:39:43,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-26 15:40:00,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2023-11-26 15:40:01,172 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:40:03,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3456106.6666666665, ans=0.125 2023-11-26 15:40:08,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.757e+01 9.290e+01 1.016e+02 1.312e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:40:15,048 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1400, loss[loss=0.05234, simple_loss=0.06713, pruned_loss=0.008797, audio_tagging_loss=0.009979, over 15354.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08983, pruned_loss=0.0124, audio_tagging_loss=0.008682, over 3048530.58 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:40:36,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3456306.6666666665, ans=0.035 2023-11-26 15:40:39,869 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-26 15:41:01,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3456440.0, ans=0.125 2023-11-26 15:41:10,961 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1450, loss[loss=0.06381, simple_loss=0.09173, pruned_loss=0.009937, audio_tagging_loss=0.008007, over 14017.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08969, pruned_loss=0.01229, audio_tagging_loss=0.008742, over 3056012.18 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:41:31,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3456573.3333333335, ans=0.0 2023-11-26 15:41:31,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3456573.3333333335, ans=0.0 2023-11-26 15:41:36,279 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-26 15:41:41,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3456640.0, ans=0.0 2023-11-26 15:41:52,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3456706.6666666665, ans=0.125 2023-11-26 15:41:59,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-26 15:42:00,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 9.010e+01 9.704e+01 1.035e+02 1.675e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 15:42:04,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3456773.3333333335, ans=0.0 2023-11-26 15:42:05,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3456773.3333333335, ans=0.0 2023-11-26 15:42:08,023 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1500, loss[loss=0.06104, simple_loss=0.0833, pruned_loss=0.009759, audio_tagging_loss=0.009633, over 16746.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09029, pruned_loss=0.01246, audio_tagging_loss=0.008813, over 3056192.85 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:42:08,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:17,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:28,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-26 15:42:32,716 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-26 15:42:46,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3457040.0, ans=0.125 2023-11-26 15:42:50,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3457040.0, ans=0.2 2023-11-26 15:42:54,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-26 15:43:03,427 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1550, loss[loss=0.08778, simple_loss=0.1337, pruned_loss=0.0149, audio_tagging_loss=0.006012, over 15513.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08982, pruned_loss=0.01229, audio_tagging_loss=0.008887, over 3054888.62 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:43:12,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3457173.3333333335, ans=0.2 2023-11-26 15:43:13,722 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:43:18,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-26 15:43:27,909 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-26 15:43:30,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3457306.6666666665, ans=0.125 2023-11-26 15:43:37,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3457373.3333333335, ans=0.2 2023-11-26 15:43:52,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.861e+01 9.494e+01 1.024e+02 1.186e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 15:43:54,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3457440.0, ans=0.125 2023-11-26 15:43:57,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3457440.0, ans=10.0 2023-11-26 15:43:59,621 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1600, loss[loss=0.04494, simple_loss=0.05335, pruned_loss=0.006735, audio_tagging_loss=0.01153, over 14878.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08992, pruned_loss=0.01236, audio_tagging_loss=0.008955, over 3052141.38 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:44:05,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-26 15:44:12,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3457573.3333333335, ans=0.0 2023-11-26 15:44:14,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3457573.3333333335, ans=0.125 2023-11-26 15:44:24,890 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-26 15:44:31,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-26 15:44:49,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3457773.3333333335, ans=0.125 2023-11-26 15:44:55,867 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1650, loss[loss=0.07026, simple_loss=0.0886, pruned_loss=0.01702, audio_tagging_loss=0.008938, over 15197.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08973, pruned_loss=0.01229, audio_tagging_loss=0.008981, over 3053177.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:44:56,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3457840.0, ans=0.1 2023-11-26 15:45:03,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2023-11-26 15:45:20,413 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-26 15:45:25,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457973.3333333335, ans=0.1 2023-11-26 15:45:28,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3458040.0, ans=0.125 2023-11-26 15:45:40,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3458106.6666666665, ans=0.125 2023-11-26 15:45:45,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3458106.6666666665, ans=0.125 2023-11-26 15:45:45,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 9.023e+01 9.377e+01 1.009e+02 1.256e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 15:45:50,442 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:45:52,306 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1700, loss[loss=0.06656, simple_loss=0.08378, pruned_loss=0.01583, audio_tagging_loss=0.008831, over 13748.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08932, pruned_loss=0.0121, audio_tagging_loss=0.008963, over 3050782.17 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:45:58,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2023-11-26 15:46:14,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2023-11-26 15:46:16,690 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-26 15:46:42,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-26 15:46:47,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2023-11-26 15:46:47,673 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1750, loss[loss=0.04915, simple_loss=0.07018, pruned_loss=0.0053, audio_tagging_loss=0.008757, over 14535.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08906, pruned_loss=0.01193, audio_tagging_loss=0.008931, over 3045464.92 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:46:57,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2023-11-26 15:46:58,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3458573.3333333335, ans=0.0 2023-11-26 15:47:00,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3458573.3333333335, ans=0.125 2023-11-26 15:47:13,187 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-26 15:47:22,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3458706.6666666665, ans=0.125 2023-11-26 15:47:25,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.75 vs. limit=6.0 2023-11-26 15:47:32,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3458773.3333333335, ans=0.125 2023-11-26 15:47:38,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.660e+01 9.290e+01 1.019e+02 1.190e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:47:44,455 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1800, loss[loss=0.06449, simple_loss=0.09618, pruned_loss=0.01003, audio_tagging_loss=0.006371, over 14889.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08944, pruned_loss=0.01186, audio_tagging_loss=0.008786, over 3049725.36 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:47:48,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3458840.0, ans=0.1 2023-11-26 15:48:05,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3458906.6666666665, ans=0.1 2023-11-26 15:48:08,971 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-26 15:48:09,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3458973.3333333335, ans=0.125 2023-11-26 15:48:14,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3458973.3333333335, ans=0.125 2023-11-26 15:48:18,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3459040.0, ans=0.1 2023-11-26 15:48:40,879 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1850, loss[loss=0.06844, simple_loss=0.098, pruned_loss=0.0133, audio_tagging_loss=0.006132, over 14729.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08923, pruned_loss=0.01212, audio_tagging_loss=0.008698, over 3049729.39 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:49:02,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-26 15:49:02,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3459306.6666666665, ans=0.125 2023-11-26 15:49:04,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-26 15:49:05,184 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-26 15:49:11,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-26 15:49:28,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3459440.0, ans=0.1 2023-11-26 15:49:30,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 8.755e+01 9.422e+01 1.025e+02 1.230e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 15:49:36,705 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1900, loss[loss=0.08438, simple_loss=0.1112, pruned_loss=0.01922, audio_tagging_loss=0.009583, over 15193.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08868, pruned_loss=0.01192, audio_tagging_loss=0.008693, over 3054241.93 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:49:51,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3459573.3333333335, ans=0.125 2023-11-26 15:50:02,405 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-26 15:50:03,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3459640.0, ans=0.125 2023-11-26 15:50:09,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3459640.0, ans=0.125 2023-11-26 15:50:28,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459773.3333333335, ans=0.1 2023-11-26 15:50:33,052 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1950, loss[loss=0.05485, simple_loss=0.07046, pruned_loss=0.01109, audio_tagging_loss=0.008531, over 14850.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08869, pruned_loss=0.01198, audio_tagging_loss=0.00865, over 3060352.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:50:36,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=22.5 2023-11-26 15:50:38,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3459840.0, ans=0.0 2023-11-26 15:50:43,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3459840.0, ans=0.2 2023-11-26 15:50:55,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3459973.3333333335, ans=0.1 2023-11-26 15:50:58,244 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-26 15:51:11,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3460040.0, ans=0.125 2023-11-26 15:51:22,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460106.6666666665, ans=0.1 2023-11-26 15:51:23,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.668e+01 9.341e+01 1.000e+02 1.329e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:51:23,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3460106.6666666665, ans=0.125 2023-11-26 15:51:25,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3460106.6666666665, ans=0.0 2023-11-26 15:51:28,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-11-26 15:51:30,362 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2000, loss[loss=0.04885, simple_loss=0.05863, pruned_loss=0.01056, audio_tagging_loss=0.008982, over 15409.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.0887, pruned_loss=0.01212, audio_tagging_loss=0.008761, over 3060255.62 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:51:52,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-26 15:51:54,341 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-26 15:52:03,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-26 15:52:22,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3460440.0, ans=0.125 2023-11-26 15:52:23,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3460440.0, ans=0.1 2023-11-26 15:52:23,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3460440.0, ans=0.2 2023-11-26 15:52:26,022 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2050, loss[loss=0.05953, simple_loss=0.07641, pruned_loss=0.01205, audio_tagging_loss=0.009277, over 14575.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08886, pruned_loss=0.01217, audio_tagging_loss=0.008674, over 3056804.73 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:52:28,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3460506.6666666665, ans=0.125 2023-11-26 15:52:38,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3460573.3333333335, ans=0.125 2023-11-26 15:52:46,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460573.3333333335, ans=0.1 2023-11-26 15:52:51,670 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-26 15:52:58,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3460640.0, ans=0.5 2023-11-26 15:53:07,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460706.6666666665, ans=0.1 2023-11-26 15:53:10,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-26 15:53:14,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3460773.3333333335, ans=0.0 2023-11-26 15:53:15,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.697e+01 9.387e+01 1.014e+02 2.680e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 15:53:19,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3460773.3333333335, ans=0.0 2023-11-26 15:53:21,940 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2100, loss[loss=0.06594, simple_loss=0.09515, pruned_loss=0.01063, audio_tagging_loss=0.007726, over 15282.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08926, pruned_loss=0.01225, audio_tagging_loss=0.00858, over 3056959.74 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:53:41,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3460906.6666666665, ans=0.0 2023-11-26 15:53:46,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-26 15:54:01,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2023-11-26 15:54:17,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3461173.3333333335, ans=0.0 2023-11-26 15:54:18,650 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2150, loss[loss=0.0595, simple_loss=0.08547, pruned_loss=0.01073, audio_tagging_loss=0.006028, over 15340.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08908, pruned_loss=0.01223, audio_tagging_loss=0.008656, over 3055209.72 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:54:19,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3461173.3333333335, ans=0.125 2023-11-26 15:54:29,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3461240.0, ans=0.0 2023-11-26 15:54:32,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3461240.0, ans=0.0 2023-11-26 15:54:42,982 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-26 15:54:44,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-26 15:54:47,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-26 15:54:52,776 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:55:07,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.113e+01 9.715e+01 1.044e+02 1.389e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 15:55:14,125 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2200, loss[loss=0.04546, simple_loss=0.05162, pruned_loss=0.008076, audio_tagging_loss=0.01157, over 14026.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08939, pruned_loss=0.01228, audio_tagging_loss=0.008647, over 3048109.50 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:55:16,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461506.6666666665, ans=0.1 2023-11-26 15:55:37,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2023-11-26 15:55:39,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-26 15:56:05,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2023-11-26 15:56:08,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461773.3333333335, ans=0.1 2023-11-26 15:56:10,446 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2250, loss[loss=0.05539, simple_loss=0.06798, pruned_loss=0.01143, audio_tagging_loss=0.009969, over 14727.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08918, pruned_loss=0.0123, audio_tagging_loss=0.008677, over 3051690.88 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:56:22,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3461906.6666666665, ans=0.1 2023-11-26 15:56:25,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3461906.6666666665, ans=0.125 2023-11-26 15:56:35,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2023-11-26 15:56:35,494 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-26 15:56:36,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3461973.3333333335, ans=0.0 2023-11-26 15:56:36,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3461973.3333333335, ans=0.1 2023-11-26 15:56:38,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2023-11-26 15:56:38,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-11-26 15:56:40,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3461973.3333333335, ans=0.125 2023-11-26 15:56:57,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2023-11-26 15:57:00,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.882e+01 9.360e+01 1.008e+02 1.473e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 15:57:06,967 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2300, loss[loss=0.04586, simple_loss=0.05688, pruned_loss=0.007496, audio_tagging_loss=0.009925, over 14349.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08941, pruned_loss=0.01236, audio_tagging_loss=0.008712, over 3045215.14 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:57:30,730 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-26 15:57:36,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3462306.6666666665, ans=0.125 2023-11-26 15:57:36,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3462306.6666666665, ans=0.125 2023-11-26 15:57:38,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462373.3333333335, ans=0.1 2023-11-26 15:57:50,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2023-11-26 15:57:54,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=12.0 2023-11-26 15:57:56,350 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:58:02,755 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2350, loss[loss=0.05753, simple_loss=0.07512, pruned_loss=0.008395, audio_tagging_loss=0.01157, over 14939.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08974, pruned_loss=0.01235, audio_tagging_loss=0.008745, over 3042153.07 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:58:06,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3462506.6666666665, ans=0.125 2023-11-26 15:58:24,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3462640.0, ans=0.125 2023-11-26 15:58:25,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2023-11-26 15:58:26,790 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-26 15:58:29,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3462640.0, ans=0.125 2023-11-26 15:58:30,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3462640.0, ans=0.125 2023-11-26 15:58:41,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-26 15:58:52,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.835e+01 9.579e+01 1.049e+02 1.967e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 15:58:58,920 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2400, loss[loss=0.06322, simple_loss=0.0873, pruned_loss=0.01157, audio_tagging_loss=0.00801, over 15327.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09021, pruned_loss=0.01235, audio_tagging_loss=0.008862, over 3044715.27 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:59:06,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3462840.0, ans=0.0 2023-11-26 15:59:22,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3462973.3333333335, ans=0.125 2023-11-26 15:59:23,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-26 15:59:24,259 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-26 15:59:37,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3463040.0, ans=0.0 2023-11-26 15:59:38,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2023-11-26 15:59:42,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3463040.0, ans=0.125 2023-11-26 15:59:46,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3463106.6666666665, ans=0.015 2023-11-26 15:59:54,841 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2450, loss[loss=0.05799, simple_loss=0.08254, pruned_loss=0.008757, audio_tagging_loss=0.007963, over 15896.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08916, pruned_loss=0.01209, audio_tagging_loss=0.008934, over 3040989.59 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:00:12,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-26 16:00:20,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-26 16:00:24,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3463306.6666666665, ans=0.1 2023-11-26 16:00:43,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3463440.0, ans=0.0 2023-11-26 16:00:46,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 8.970e+01 9.604e+01 1.012e+02 1.518e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 16:00:47,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3463440.0, ans=0.125 2023-11-26 16:00:52,121 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2500, loss[loss=0.08682, simple_loss=0.1217, pruned_loss=0.01977, audio_tagging_loss=0.006213, over 15347.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08957, pruned_loss=0.01221, audio_tagging_loss=0.008995, over 3042171.44 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:00:54,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-26 16:00:58,879 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:01:06,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3463573.3333333335, ans=0.125 2023-11-26 16:01:09,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3463573.3333333335, ans=0.125 2023-11-26 16:01:11,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-26 16:01:16,340 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-26 16:01:31,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-26 16:01:47,778 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2550, loss[loss=0.05123, simple_loss=0.06524, pruned_loss=0.005698, audio_tagging_loss=0.01291, over 15459.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08901, pruned_loss=0.01218, audio_tagging_loss=0.008926, over 3043981.72 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:01:58,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-26 16:01:59,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3463906.6666666665, ans=0.125 2023-11-26 16:02:01,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3463906.6666666665, ans=0.125 2023-11-26 16:02:04,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3463906.6666666665, ans=0.0 2023-11-26 16:02:12,972 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-26 16:02:21,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2023-11-26 16:02:25,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=22.5 2023-11-26 16:02:40,011 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.652e+01 9.307e+01 9.985e+01 1.166e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 16:02:44,320 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2600, loss[loss=0.05794, simple_loss=0.08059, pruned_loss=0.008946, audio_tagging_loss=0.008699, over 16030.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.089, pruned_loss=0.01204, audio_tagging_loss=0.008793, over 3039550.47 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:02:49,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3464173.3333333335, ans=0.125 2023-11-26 16:02:53,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3464173.3333333335, ans=0.1 2023-11-26 16:02:54,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3464173.3333333335, ans=0.95 2023-11-26 16:02:57,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3464240.0, ans=10.0 2023-11-26 16:03:00,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3464240.0, ans=0.0 2023-11-26 16:03:00,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-26 16:03:02,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2023-11-26 16:03:09,612 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-26 16:03:13,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.10 vs. limit=10.0 2023-11-26 16:03:40,915 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2650, loss[loss=0.07409, simple_loss=0.1062, pruned_loss=0.01287, audio_tagging_loss=0.008136, over 16188.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08972, pruned_loss=0.0121, audio_tagging_loss=0.008676, over 3048536.34 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:03:43,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-26 16:03:44,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3464506.6666666665, ans=0.125 2023-11-26 16:04:02,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3464640.0, ans=0.125 2023-11-26 16:04:05,219 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-26 16:04:10,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3464640.0, ans=0.125 2023-11-26 16:04:16,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3464706.6666666665, ans=0.0 2023-11-26 16:04:32,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.790e+01 9.468e+01 1.013e+02 1.366e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 16:04:36,895 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2700, loss[loss=0.04877, simple_loss=0.05703, pruned_loss=0.007996, audio_tagging_loss=0.01226, over 16402.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09001, pruned_loss=0.01217, audio_tagging_loss=0.008641, over 3055549.20 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:04:52,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-26 16:04:56,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3464906.6666666665, ans=0.125 2023-11-26 16:05:02,155 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-26 16:05:05,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3464973.3333333335, ans=0.125 2023-11-26 16:05:09,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3465040.0, ans=0.0 2023-11-26 16:05:33,075 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2750, loss[loss=0.07712, simple_loss=0.1125, pruned_loss=0.01388, audio_tagging_loss=0.006975, over 16009.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09021, pruned_loss=0.01219, audio_tagging_loss=0.008625, over 3061024.21 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:05:49,182 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:05:57,560 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-26 16:05:58,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3465306.6666666665, ans=0.1 2023-11-26 16:06:09,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-11-26 16:06:18,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2023-11-26 16:06:19,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3465440.0, ans=0.125 2023-11-26 16:06:22,660 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:06:25,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.811e+01 9.283e+01 1.006e+02 1.287e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 16:06:29,590 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2800, loss[loss=0.04525, simple_loss=0.06554, pruned_loss=0.004757, audio_tagging_loss=0.007718, over 14329.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08993, pruned_loss=0.012, audio_tagging_loss=0.008596, over 3055718.34 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:06:45,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3465573.3333333335, ans=0.0 2023-11-26 16:06:54,066 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-26 16:06:56,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3465640.0, ans=0.2 2023-11-26 16:07:05,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3465706.6666666665, ans=0.09899494936611666 2023-11-26 16:07:16,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3465773.3333333335, ans=0.2 2023-11-26 16:07:17,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3465773.3333333335, ans=0.125 2023-11-26 16:07:24,877 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2850, loss[loss=0.06282, simple_loss=0.09083, pruned_loss=0.008928, audio_tagging_loss=0.008476, over 15085.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08946, pruned_loss=0.01203, audio_tagging_loss=0.008593, over 3054046.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:07:35,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3465906.6666666665, ans=0.1 2023-11-26 16:07:50,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-26 16:07:57,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3465973.3333333335, ans=0.2 2023-11-26 16:08:05,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3466040.0, ans=0.125 2023-11-26 16:08:08,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3466040.0, ans=0.0 2023-11-26 16:08:18,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.838e+01 9.404e+01 1.047e+02 1.303e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 16:08:21,800 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2900, loss[loss=0.07175, simple_loss=0.1003, pruned_loss=0.01139, audio_tagging_loss=0.01019, over 16608.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09021, pruned_loss=0.01203, audio_tagging_loss=0.008536, over 3055459.44 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:08:25,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=22.5 2023-11-26 16:08:39,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3466240.0, ans=0.125 2023-11-26 16:08:46,458 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-26 16:08:57,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3466373.3333333335, ans=0.1 2023-11-26 16:08:59,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.87 vs. limit=10.0 2023-11-26 16:09:01,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3466373.3333333335, ans=0.2 2023-11-26 16:09:06,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3466440.0, ans=0.09899494936611666 2023-11-26 16:09:06,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3466440.0, ans=0.125 2023-11-26 16:09:08,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3466440.0, ans=0.0 2023-11-26 16:09:18,645 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2950, loss[loss=0.06248, simple_loss=0.08352, pruned_loss=0.009001, audio_tagging_loss=0.01172, over 14520.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09015, pruned_loss=0.01207, audio_tagging_loss=0.008583, over 3050291.37 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:09:27,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-26 16:09:35,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3466573.3333333335, ans=0.125 2023-11-26 16:09:43,063 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-26 16:09:48,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-11-26 16:09:48,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3466640.0, ans=0.125 2023-11-26 16:09:58,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3466706.6666666665, ans=0.2 2023-11-26 16:09:59,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3466706.6666666665, ans=0.125 2023-11-26 16:10:02,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-26 16:10:12,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.851e+01 9.554e+01 1.014e+02 1.213e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 16:10:15,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3466840.0, ans=0.125 2023-11-26 16:10:16,219 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3000, loss[loss=0.05148, simple_loss=0.06643, pruned_loss=0.009325, audio_tagging_loss=0.008944, over 14464.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09118, pruned_loss=0.01224, audio_tagging_loss=0.008511, over 3049451.58 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:10:16,220 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 16:10:39,330 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9639, 3.1594, 2.9214, 3.1422, 3.3874, 2.7593, 3.4262, 2.6151], device='cuda:1') 2023-11-26 16:10:48,829 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05748, simple_loss=0.05058, pruned_loss=0.005287, audio_tagging_loss=0.02691, over 4681554.00 frames. 2023-11-26 16:10:48,830 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 16:11:13,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-26 16:11:18,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2023-11-26 16:11:28,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3467040.0, ans=0.2 2023-11-26 16:11:39,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3467106.6666666665, ans=0.0 2023-11-26 16:11:40,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3467106.6666666665, ans=0.0 2023-11-26 16:11:42,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-11-26 16:11:45,575 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3050, loss[loss=0.06286, simple_loss=0.08714, pruned_loss=0.01204, audio_tagging_loss=0.007247, over 16008.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09077, pruned_loss=0.01233, audio_tagging_loss=0.008598, over 3043995.20 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:11:54,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3467173.3333333335, ans=0.0 2023-11-26 16:12:02,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-26 16:12:09,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-26 16:12:10,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 16:12:19,142 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:12:29,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3467440.0, ans=0.125 2023-11-26 16:12:33,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2023-11-26 16:12:37,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.992e+01 9.720e+01 1.054e+02 1.278e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 16:12:41,146 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3100, loss[loss=0.07032, simple_loss=0.1009, pruned_loss=0.01069, audio_tagging_loss=0.009181, over 16528.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09103, pruned_loss=0.01233, audio_tagging_loss=0.008634, over 3052244.31 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:12:51,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3467573.3333333335, ans=0.1 2023-11-26 16:13:06,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-26 16:13:08,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3467640.0, ans=0.125 2023-11-26 16:13:12,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-26 16:13:14,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3467706.6666666665, ans=0.1 2023-11-26 16:13:21,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3467706.6666666665, ans=0.0 2023-11-26 16:13:28,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3467773.3333333335, ans=0.125 2023-11-26 16:13:37,163 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3150, loss[loss=0.06659, simple_loss=0.08292, pruned_loss=0.01586, audio_tagging_loss=0.009271, over 14043.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09116, pruned_loss=0.01249, audio_tagging_loss=0.008716, over 3050052.52 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 16:13:45,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3467840.0, ans=0.07 2023-11-26 16:13:48,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3467906.6666666665, ans=0.015 2023-11-26 16:13:55,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2023-11-26 16:14:01,388 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-26 16:14:12,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3468040.0, ans=0.125 2023-11-26 16:14:17,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2023-11-26 16:14:31,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.852e+01 9.512e+01 1.032e+02 1.320e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:14:33,246 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3200, loss[loss=0.0551, simple_loss=0.06572, pruned_loss=0.01189, audio_tagging_loss=0.01034, over 14199.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09101, pruned_loss=0.01239, audio_tagging_loss=0.008839, over 3050355.37 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:14:33,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2023-11-26 16:14:56,924 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-26 16:15:07,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3468373.3333333335, ans=0.0 2023-11-26 16:15:11,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3468373.3333333335, ans=0.125 2023-11-26 16:15:12,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3468373.3333333335, ans=0.09899494936611666 2023-11-26 16:15:28,512 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3250, loss[loss=0.06933, simple_loss=0.09394, pruned_loss=0.01278, audio_tagging_loss=0.009585, over 15509.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09069, pruned_loss=0.01223, audio_tagging_loss=0.008922, over 3049006.40 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:15:49,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=12.0 2023-11-26 16:15:49,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3468640.0, ans=0.125 2023-11-26 16:15:54,098 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-26 16:15:55,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-26 16:15:57,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3468640.0, ans=0.0 2023-11-26 16:15:58,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3468640.0, ans=0.0 2023-11-26 16:16:14,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-26 16:16:15,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3468773.3333333335, ans=0.0 2023-11-26 16:16:21,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.992e+01 9.345e+01 1.022e+02 1.465e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 16:16:23,913 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3300, loss[loss=0.09005, simple_loss=0.1306, pruned_loss=0.01853, audio_tagging_loss=0.006239, over 15457.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09042, pruned_loss=0.0122, audio_tagging_loss=0.009004, over 3041176.80 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:16:29,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3468840.0, ans=0.125 2023-11-26 16:16:41,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3468906.6666666665, ans=0.125 2023-11-26 16:16:44,366 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:16:49,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-26 16:16:53,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3468973.3333333335, ans=0.125 2023-11-26 16:17:00,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-11-26 16:17:13,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3469106.6666666665, ans=0.125 2023-11-26 16:17:21,082 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3350, loss[loss=0.05908, simple_loss=0.08477, pruned_loss=0.006657, audio_tagging_loss=0.01004, over 15139.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08939, pruned_loss=0.01199, audio_tagging_loss=0.008903, over 3041296.45 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:17:23,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3469173.3333333335, ans=0.2 2023-11-26 16:17:23,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3469173.3333333335, ans=0.125 2023-11-26 16:17:28,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3469173.3333333335, ans=10.0 2023-11-26 16:17:28,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3469173.3333333335, ans=0.125 2023-11-26 16:17:33,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3469240.0, ans=0.125 2023-11-26 16:17:33,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3469240.0, ans=10.0 2023-11-26 16:17:44,300 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-26 16:17:51,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3469306.6666666665, ans=0.1 2023-11-26 16:18:03,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3469373.3333333335, ans=0.0 2023-11-26 16:18:13,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.575e+01 9.293e+01 1.025e+02 1.253e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 16:18:15,814 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3400, loss[loss=0.08475, simple_loss=0.1224, pruned_loss=0.01637, audio_tagging_loss=0.007179, over 15842.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0898, pruned_loss=0.01203, audio_tagging_loss=0.008781, over 3040671.40 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:18:40,620 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-26 16:18:46,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3469640.0, ans=0.125 2023-11-26 16:19:05,963 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:19:10,990 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3450, loss[loss=0.04896, simple_loss=0.06569, pruned_loss=0.009195, audio_tagging_loss=0.006925, over 14328.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08927, pruned_loss=0.01203, audio_tagging_loss=0.008836, over 3038010.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:19:33,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-26 16:19:36,324 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-26 16:19:46,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3470040.0, ans=0.125 2023-11-26 16:19:49,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3470040.0, ans=0.125 2023-11-26 16:20:05,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.921e+01 9.582e+01 1.025e+02 1.197e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 16:20:05,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-11-26 16:20:07,465 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3500, loss[loss=0.06682, simple_loss=0.09456, pruned_loss=0.0106, audio_tagging_loss=0.008942, over 14846.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09012, pruned_loss=0.01222, audio_tagging_loss=0.008748, over 3040481.38 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:20:31,638 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-26 16:20:35,904 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:20:38,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-26 16:20:55,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3470440.0, ans=0.1 2023-11-26 16:21:03,047 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3550, loss[loss=0.06015, simple_loss=0.08068, pruned_loss=0.01223, audio_tagging_loss=0.007584, over 16218.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08989, pruned_loss=0.01229, audio_tagging_loss=0.008708, over 3043559.70 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:21:06,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3470506.6666666665, ans=0.125 2023-11-26 16:21:26,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2023-11-26 16:21:27,161 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-26 16:21:37,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3470706.6666666665, ans=0.1 2023-11-26 16:21:51,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3470773.3333333335, ans=0.2 2023-11-26 16:21:55,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-11-26 16:21:55,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.906e+01 9.475e+01 1.015e+02 1.360e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:21:56,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3470773.3333333335, ans=0.0 2023-11-26 16:21:57,992 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3600, loss[loss=0.06891, simple_loss=0.08759, pruned_loss=0.01508, audio_tagging_loss=0.01004, over 15049.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08995, pruned_loss=0.01234, audio_tagging_loss=0.008788, over 3042500.42 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:22:22,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3470973.3333333335, ans=0.125 2023-11-26 16:22:23,261 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-26 16:22:24,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-26 16:22:31,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3471040.0, ans=0.2 2023-11-26 16:22:54,296 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3650, loss[loss=0.07973, simple_loss=0.1068, pruned_loss=0.01725, audio_tagging_loss=0.009096, over 14656.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08926, pruned_loss=0.0122, audio_tagging_loss=0.008732, over 3046151.22 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:23:07,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-26 16:23:18,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-26 16:23:34,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-26 16:23:39,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3471440.0, ans=0.0 2023-11-26 16:23:40,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3471440.0, ans=0.125 2023-11-26 16:23:47,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.866e+01 9.412e+01 1.015e+02 1.534e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 16:23:49,726 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3700, loss[loss=0.06737, simple_loss=0.08349, pruned_loss=0.01595, audio_tagging_loss=0.009678, over 15641.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08973, pruned_loss=0.01236, audio_tagging_loss=0.008701, over 3044014.08 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:24:00,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3471573.3333333335, ans=0.04949747468305833 2023-11-26 16:24:03,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-26 16:24:13,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-26 16:24:44,728 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3750, loss[loss=0.06379, simple_loss=0.08971, pruned_loss=0.01041, audio_tagging_loss=0.008523, over 15181.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09115, pruned_loss=0.01258, audio_tagging_loss=0.008659, over 3043185.23 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:24:55,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3471906.6666666665, ans=0.09899494936611666 2023-11-26 16:24:55,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=12.0 2023-11-26 16:25:02,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3471906.6666666665, ans=0.0 2023-11-26 16:25:08,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-26 16:25:09,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-26 16:25:16,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3471973.3333333335, ans=0.125 2023-11-26 16:25:22,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3472040.0, ans=0.125 2023-11-26 16:25:24,028 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:25:29,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3472106.6666666665, ans=0.125 2023-11-26 16:25:39,676 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.098e+01 9.695e+01 1.024e+02 1.279e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 16:25:40,798 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3800, loss[loss=0.06036, simple_loss=0.08001, pruned_loss=0.01036, audio_tagging_loss=0.009993, over 15212.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09125, pruned_loss=0.01248, audio_tagging_loss=0.008749, over 3045810.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:25:49,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3472173.3333333335, ans=0.09899494936611666 2023-11-26 16:26:03,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3472306.6666666665, ans=0.0 2023-11-26 16:26:05,843 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-26 16:26:07,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3472306.6666666665, ans=0.1 2023-11-26 16:26:15,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3472373.3333333335, ans=0.025 2023-11-26 16:26:20,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3472373.3333333335, ans=0.1 2023-11-26 16:26:22,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3472373.3333333335, ans=0.04949747468305833 2023-11-26 16:26:36,568 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3850, loss[loss=0.07649, simple_loss=0.1026, pruned_loss=0.01742, audio_tagging_loss=0.00775, over 14596.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09139, pruned_loss=0.0125, audio_tagging_loss=0.008687, over 3044140.04 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:26:40,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-11-26 16:26:42,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3472506.6666666665, ans=0.125 2023-11-26 16:26:59,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472640.0, ans=0.1 2023-11-26 16:27:01,210 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-26 16:27:12,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3472706.6666666665, ans=0.0 2023-11-26 16:27:19,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-11-26 16:27:22,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3472773.3333333335, ans=0.125 2023-11-26 16:27:30,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.861e+01 9.516e+01 1.005e+02 1.326e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 16:27:32,024 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3900, loss[loss=0.06602, simple_loss=0.08647, pruned_loss=0.01342, audio_tagging_loss=0.009365, over 16517.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09096, pruned_loss=0.01253, audio_tagging_loss=0.008823, over 3045536.55 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:27:36,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3472840.0, ans=0.025 2023-11-26 16:27:39,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472840.0, ans=0.1 2023-11-26 16:27:45,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3472906.6666666665, ans=0.0 2023-11-26 16:27:46,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3472906.6666666665, ans=0.1 2023-11-26 16:27:57,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-26 16:27:59,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3472973.3333333335, ans=0.125 2023-11-26 16:28:08,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3473040.0, ans=0.125 2023-11-26 16:28:17,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3473106.6666666665, ans=0.125 2023-11-26 16:28:22,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3473106.6666666665, ans=0.125 2023-11-26 16:28:28,142 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3950, loss[loss=0.06785, simple_loss=0.08746, pruned_loss=0.01548, audio_tagging_loss=0.008639, over 15484.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09045, pruned_loss=0.01246, audio_tagging_loss=0.008932, over 3045013.65 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:28:31,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3473173.3333333335, ans=0.1 2023-11-26 16:28:47,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3473240.0, ans=0.125 2023-11-26 16:28:51,996 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-26 16:29:03,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3473373.3333333335, ans=0.025 2023-11-26 16:29:22,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.021e+01 9.497e+01 1.017e+02 1.308e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 16:29:23,983 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4000, loss[loss=0.08951, simple_loss=0.1219, pruned_loss=0.02112, audio_tagging_loss=0.007459, over 17059.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09068, pruned_loss=0.01255, audio_tagging_loss=0.008953, over 3048481.54 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:29:48,264 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-26 16:30:19,801 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4050, loss[loss=0.05512, simple_loss=0.06844, pruned_loss=0.01046, audio_tagging_loss=0.01044, over 14647.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.0904, pruned_loss=0.01254, audio_tagging_loss=0.008942, over 3052121.29 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:30:23,564 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:30:24,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3473840.0, ans=0.1 2023-11-26 16:30:44,963 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-26 16:30:54,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474040.0, ans=0.1 2023-11-26 16:31:07,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3474106.6666666665, ans=0.0 2023-11-26 16:31:13,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3474106.6666666665, ans=0.125 2023-11-26 16:31:14,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.820e+01 9.465e+01 1.007e+02 1.196e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:31:15,937 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4100, loss[loss=0.06305, simple_loss=0.09126, pruned_loss=0.009489, audio_tagging_loss=0.007937, over 15822.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09075, pruned_loss=0.01251, audio_tagging_loss=0.008857, over 3049927.75 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:31:40,671 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-26 16:31:55,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3474373.3333333335, ans=0.0 2023-11-26 16:32:01,115 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:32:12,711 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4150, loss[loss=0.05681, simple_loss=0.08077, pruned_loss=0.01064, audio_tagging_loss=0.005783, over 15961.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09135, pruned_loss=0.01267, audio_tagging_loss=0.008727, over 3046041.60 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:32:21,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474506.6666666665, ans=0.1 2023-11-26 16:32:36,823 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-26 16:32:38,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474640.0, ans=0.1 2023-11-26 16:32:54,690 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:32:56,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-26 16:33:06,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3474773.3333333335, ans=0.125 2023-11-26 16:33:07,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 9.007e+01 9.363e+01 1.013e+02 1.321e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 16:33:08,497 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4200, loss[loss=0.05977, simple_loss=0.08953, pruned_loss=0.007561, audio_tagging_loss=0.007443, over 14938.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09063, pruned_loss=0.01249, audio_tagging_loss=0.008732, over 3053056.14 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:33:10,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-11-26 16:33:30,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3474973.3333333335, ans=0.125 2023-11-26 16:33:33,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-26 16:33:34,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.73 vs. limit=22.5 2023-11-26 16:33:45,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3475040.0, ans=0.0 2023-11-26 16:33:54,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2023-11-26 16:34:04,108 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4250, loss[loss=0.05873, simple_loss=0.09153, pruned_loss=0.007296, audio_tagging_loss=0.005672, over 15340.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09151, pruned_loss=0.01267, audio_tagging_loss=0.008581, over 3047430.41 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:34:09,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-26 16:34:25,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3475306.6666666665, ans=0.1 2023-11-26 16:34:28,607 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-26 16:34:37,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-11-26 16:34:45,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3475373.3333333335, ans=0.2 2023-11-26 16:34:59,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.954e+01 9.475e+01 1.016e+02 1.438e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:35:00,223 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4300, loss[loss=0.07713, simple_loss=0.1082, pruned_loss=0.01486, audio_tagging_loss=0.008154, over 14839.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09125, pruned_loss=0.01269, audio_tagging_loss=0.008554, over 3044999.99 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:35:08,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-26 16:35:09,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3475573.3333333335, ans=0.0 2023-11-26 16:35:23,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-26 16:35:45,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3475773.3333333335, ans=0.125 2023-11-26 16:35:55,108 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4350, loss[loss=0.06665, simple_loss=0.09829, pruned_loss=0.009179, audio_tagging_loss=0.008332, over 15474.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09118, pruned_loss=0.01269, audio_tagging_loss=0.008541, over 3051014.33 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:35:55,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3475840.0, ans=0.125 2023-11-26 16:36:04,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3475906.6666666665, ans=0.125 2023-11-26 16:36:07,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-26 16:36:11,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3475906.6666666665, ans=0.125 2023-11-26 16:36:19,531 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-26 16:36:21,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=15.0 2023-11-26 16:36:23,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3475973.3333333335, ans=0.125 2023-11-26 16:36:25,908 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:36:45,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2023-11-26 16:36:47,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3476106.6666666665, ans=0.1 2023-11-26 16:36:47,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3476106.6666666665, ans=0.125 2023-11-26 16:36:49,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.904e+01 9.369e+01 1.014e+02 1.389e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 16:36:50,194 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4400, loss[loss=0.06646, simple_loss=0.09043, pruned_loss=0.01187, audio_tagging_loss=0.009374, over 14741.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09167, pruned_loss=0.01274, audio_tagging_loss=0.008515, over 3049607.95 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:36:50,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2023-11-26 16:36:57,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3476173.3333333335, ans=0.2 2023-11-26 16:37:12,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3476306.6666666665, ans=0.125 2023-11-26 16:37:14,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-26 16:37:41,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476440.0, ans=0.1 2023-11-26 16:37:46,787 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4450, loss[loss=0.06436, simple_loss=0.08711, pruned_loss=0.01085, audio_tagging_loss=0.009957, over 15524.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09105, pruned_loss=0.01268, audio_tagging_loss=0.008447, over 3045449.30 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:37:49,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476506.6666666665, ans=0.1 2023-11-26 16:37:49,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2023-11-26 16:37:54,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3476506.6666666665, ans=0.125 2023-11-26 16:37:57,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476573.3333333335, ans=0.1 2023-11-26 16:38:10,255 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-26 16:38:13,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476640.0, ans=0.1 2023-11-26 16:38:26,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3476706.6666666665, ans=0.125 2023-11-26 16:38:27,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3476706.6666666665, ans=0.125 2023-11-26 16:38:40,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.912e+01 9.426e+01 1.014e+02 1.545e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 16:38:41,577 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4500, loss[loss=0.0528, simple_loss=0.06788, pruned_loss=0.006515, audio_tagging_loss=0.01235, over 16005.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09085, pruned_loss=0.01257, audio_tagging_loss=0.00843, over 3048022.70 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:38:45,047 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:38:46,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3476840.0, ans=0.2 2023-11-26 16:38:54,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476906.6666666665, ans=0.1 2023-11-26 16:38:58,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3476906.6666666665, ans=0.125 2023-11-26 16:39:05,329 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-26 16:39:29,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3477106.6666666665, ans=0.1 2023-11-26 16:39:32,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-26 16:39:36,119 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4550, loss[loss=0.07099, simple_loss=0.1067, pruned_loss=0.01249, audio_tagging_loss=0.005125, over 15928.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09121, pruned_loss=0.01251, audio_tagging_loss=0.008436, over 3051332.63 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:39:36,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3477173.3333333335, ans=0.125 2023-11-26 16:39:45,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3477173.3333333335, ans=0.2 2023-11-26 16:40:01,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-26 16:40:04,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3477306.6666666665, ans=0.125 2023-11-26 16:40:14,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3477373.3333333335, ans=0.125 2023-11-26 16:40:20,739 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:40:29,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3477440.0, ans=0.0 2023-11-26 16:40:30,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.935e+01 8.644e+01 9.356e+01 1.024e+02 1.228e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 16:40:31,932 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4600, loss[loss=0.0719, simple_loss=0.09185, pruned_loss=0.016, audio_tagging_loss=0.009969, over 14507.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09082, pruned_loss=0.01254, audio_tagging_loss=0.008544, over 3047941.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:40:50,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3477573.3333333335, ans=0.125 2023-11-26 16:40:57,094 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-26 16:41:00,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3477640.0, ans=0.04949747468305833 2023-11-26 16:41:14,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2023-11-26 16:41:22,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-26 16:41:25,811 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:41:28,642 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4650, loss[loss=0.07478, simple_loss=0.1059, pruned_loss=0.01318, audio_tagging_loss=0.008639, over 15566.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09069, pruned_loss=0.01237, audio_tagging_loss=0.008596, over 3047016.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:41:32,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3477840.0, ans=0.125 2023-11-26 16:41:37,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2023-11-26 16:41:38,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3477906.6666666665, ans=0.04949747468305833 2023-11-26 16:41:40,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3477906.6666666665, ans=0.125 2023-11-26 16:41:52,951 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-26 16:41:56,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3477973.3333333335, ans=0.125 2023-11-26 16:42:02,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3478040.0, ans=0.125 2023-11-26 16:42:08,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-26 16:42:21,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2023-11-26 16:42:23,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.839e+01 9.612e+01 1.022e+02 1.375e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 16:42:23,765 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4700, loss[loss=0.06648, simple_loss=0.09204, pruned_loss=0.013, audio_tagging_loss=0.007454, over 16985.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09017, pruned_loss=0.01238, audio_tagging_loss=0.008768, over 3053107.42 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:42:42,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3478240.0, ans=0.125 2023-11-26 16:42:43,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3478240.0, ans=0.0 2023-11-26 16:42:44,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2023-11-26 16:42:49,020 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-26 16:42:57,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2023-11-26 16:42:58,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3478373.3333333335, ans=0.125 2023-11-26 16:43:11,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478440.0, ans=0.1 2023-11-26 16:43:13,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3478440.0, ans=0.0 2023-11-26 16:43:19,244 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4750, loss[loss=0.05979, simple_loss=0.07672, pruned_loss=0.0127, audio_tagging_loss=0.008726, over 16628.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09018, pruned_loss=0.01237, audio_tagging_loss=0.008822, over 3051548.44 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:43:24,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3478506.6666666665, ans=0.125 2023-11-26 16:43:31,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3478573.3333333335, ans=0.1 2023-11-26 16:43:43,699 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-26 16:43:48,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3478640.0, ans=0.125 2023-11-26 16:44:02,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3478773.3333333335, ans=0.05 2023-11-26 16:44:12,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3478773.3333333335, ans=0.125 2023-11-26 16:44:15,711 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4800, loss[loss=0.06551, simple_loss=0.08993, pruned_loss=0.0104, audio_tagging_loss=0.01015, over 14744.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09099, pruned_loss=0.01256, audio_tagging_loss=0.008793, over 3045228.07 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:44:16,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.936e+01 9.415e+01 1.023e+02 1.286e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 16:44:22,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.70 vs. limit=15.0 2023-11-26 16:44:35,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2023-11-26 16:44:39,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-26 16:44:43,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478973.3333333335, ans=0.1 2023-11-26 16:44:54,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3479040.0, ans=0.125 2023-11-26 16:44:58,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3479040.0, ans=0.125 2023-11-26 16:45:11,482 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4850, loss[loss=0.06335, simple_loss=0.09216, pruned_loss=0.009849, audio_tagging_loss=0.007423, over 14860.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09033, pruned_loss=0.01253, audio_tagging_loss=0.00887, over 3039667.83 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:45:14,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3479173.3333333335, ans=0.125 2023-11-26 16:45:36,675 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-26 16:45:52,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3479373.3333333335, ans=0.2 2023-11-26 16:45:56,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3479440.0, ans=0.125 2023-11-26 16:46:07,655 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4900, loss[loss=0.08237, simple_loss=0.1271, pruned_loss=0.01341, audio_tagging_loss=0.005426, over 16536.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09057, pruned_loss=0.01258, audio_tagging_loss=0.008927, over 3045077.89 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:46:08,676 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.681e+01 9.501e+01 1.005e+02 1.327e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 16:46:32,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2023-11-26 16:46:32,674 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-26 16:46:33,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3479640.0, ans=0.125 2023-11-26 16:46:41,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3479706.6666666665, ans=0.1 2023-11-26 16:47:03,991 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4950, loss[loss=0.05887, simple_loss=0.08247, pruned_loss=0.007562, audio_tagging_loss=0.01008, over 15341.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09029, pruned_loss=0.01254, audio_tagging_loss=0.008816, over 3038629.14 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:47:27,912 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-26 16:47:29,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2023-11-26 16:47:43,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3480040.0, ans=0.1 2023-11-26 16:48:00,020 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5000, loss[loss=0.07228, simple_loss=0.09501, pruned_loss=0.016, audio_tagging_loss=0.008774, over 15398.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09023, pruned_loss=0.01257, audio_tagging_loss=0.008704, over 3036892.02 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:48:01,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.862e+01 9.666e+01 1.035e+02 1.226e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 16:48:14,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3480240.0, ans=0.2 2023-11-26 16:48:19,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3480240.0, ans=0.125 2023-11-26 16:48:19,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-26 16:48:25,363 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-26 16:48:31,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-11-26 16:48:41,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3480373.3333333335, ans=0.5 2023-11-26 16:48:55,718 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5050, loss[loss=0.05234, simple_loss=0.07689, pruned_loss=0.006209, audio_tagging_loss=0.007694, over 15309.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09014, pruned_loss=0.01245, audio_tagging_loss=0.008649, over 3040433.17 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:49:15,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-26 16:49:15,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3480573.3333333335, ans=0.0 2023-11-26 16:49:21,061 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-26 16:49:50,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-26 16:49:53,102 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5100, loss[loss=0.07487, simple_loss=0.09445, pruned_loss=0.01718, audio_tagging_loss=0.01047, over 15142.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08979, pruned_loss=0.0124, audio_tagging_loss=0.00872, over 3049312.95 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:49:54,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.678e+01 9.277e+01 1.001e+02 1.240e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 16:49:58,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3480840.0, ans=0.1 2023-11-26 16:50:03,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3480906.6666666665, ans=0.125 2023-11-26 16:50:10,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3480906.6666666665, ans=0.125 2023-11-26 16:50:17,109 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-26 16:50:36,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3481040.0, ans=10.0 2023-11-26 16:50:46,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2023-11-26 16:50:48,532 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5150, loss[loss=0.05683, simple_loss=0.07687, pruned_loss=0.009347, audio_tagging_loss=0.009045, over 15249.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08924, pruned_loss=0.01219, audio_tagging_loss=0.008679, over 3047416.88 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:50:53,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3481173.3333333335, ans=0.1 2023-11-26 16:51:04,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-26 16:51:11,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3481306.6666666665, ans=0.125 2023-11-26 16:51:14,131 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-26 16:51:32,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3481440.0, ans=0.2 2023-11-26 16:51:44,720 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5200, loss[loss=0.06742, simple_loss=0.0956, pruned_loss=0.01268, audio_tagging_loss=0.006941, over 15207.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0899, pruned_loss=0.01216, audio_tagging_loss=0.008604, over 3049983.78 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:51:45,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.774e+01 9.486e+01 1.034e+02 1.875e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-26 16:52:09,790 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-26 16:52:28,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-26 16:52:32,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3481773.3333333335, ans=0.0 2023-11-26 16:52:34,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3481773.3333333335, ans=0.125 2023-11-26 16:52:41,991 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5250, loss[loss=0.06887, simple_loss=0.09466, pruned_loss=0.0133, audio_tagging_loss=0.008236, over 15646.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09076, pruned_loss=0.01227, audio_tagging_loss=0.008567, over 3058536.00 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:52:45,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3481840.0, ans=0.0 2023-11-26 16:52:55,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-26 16:53:00,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3481906.6666666665, ans=0.125 2023-11-26 16:53:04,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3481973.3333333335, ans=0.1 2023-11-26 16:53:05,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-26 16:53:36,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3482173.3333333335, ans=0.125 2023-11-26 16:53:37,445 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5300, loss[loss=0.05584, simple_loss=0.07554, pruned_loss=0.008178, audio_tagging_loss=0.009898, over 15645.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09053, pruned_loss=0.01231, audio_tagging_loss=0.008556, over 3057713.01 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:53:39,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.820e+01 9.463e+01 1.024e+02 1.274e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:54:02,610 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-26 16:54:11,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482373.3333333335, ans=0.1 2023-11-26 16:54:23,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3482440.0, ans=0.125 2023-11-26 16:54:33,244 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5350, loss[loss=0.06681, simple_loss=0.08669, pruned_loss=0.01556, audio_tagging_loss=0.007911, over 14674.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.0913, pruned_loss=0.01254, audio_tagging_loss=0.008495, over 3050990.44 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:54:35,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-26 16:54:41,492 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:54:42,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-26 16:54:50,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-26 16:54:52,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3482573.3333333335, ans=0.0 2023-11-26 16:54:58,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-26 16:54:58,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3482640.0, ans=0.125 2023-11-26 16:55:30,685 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5400, loss[loss=0.0715, simple_loss=0.104, pruned_loss=0.01076, audio_tagging_loss=0.00874, over 16965.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.0911, pruned_loss=0.01251, audio_tagging_loss=0.008579, over 3047392.81 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:55:32,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.129e+01 9.512e+01 1.019e+02 1.244e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:55:50,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3482906.6666666665, ans=0.125 2023-11-26 16:55:54,156 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-26 16:56:17,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2023-11-26 16:56:26,053 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5450, loss[loss=0.05077, simple_loss=0.06651, pruned_loss=0.006949, audio_tagging_loss=0.01057, over 14781.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09002, pruned_loss=0.01242, audio_tagging_loss=0.008645, over 3046509.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:56:50,415 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-26 16:56:59,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-26 16:57:00,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-26 16:57:02,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-26 16:57:05,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3483373.3333333335, ans=0.1 2023-11-26 16:57:13,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3483440.0, ans=0.2 2023-11-26 16:57:21,245 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5500, loss[loss=0.05994, simple_loss=0.07959, pruned_loss=0.01168, audio_tagging_loss=0.008463, over 15341.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09038, pruned_loss=0.01247, audio_tagging_loss=0.008637, over 3051815.13 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:57:23,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.753e+01 9.597e+01 1.033e+02 1.583e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 16:57:23,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3483506.6666666665, ans=0.05 2023-11-26 16:57:27,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-26 16:57:27,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2023-11-26 16:57:33,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3483573.3333333335, ans=0.125 2023-11-26 16:57:46,848 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-26 16:57:46,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3483640.0, ans=0.0 2023-11-26 16:57:48,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3483640.0, ans=0.125 2023-11-26 16:57:50,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3483640.0, ans=0.0 2023-11-26 16:58:02,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3483706.6666666665, ans=0.09899494936611666 2023-11-26 16:58:13,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3483773.3333333335, ans=0.1 2023-11-26 16:58:15,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2023-11-26 16:58:18,145 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5550, loss[loss=0.04763, simple_loss=0.0586, pruned_loss=0.007131, audio_tagging_loss=0.0112, over 14299.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09099, pruned_loss=0.01247, audio_tagging_loss=0.008649, over 3048415.04 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:58:27,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3483840.0, ans=0.1 2023-11-26 16:58:28,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=12.0 2023-11-26 16:58:29,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3483906.6666666665, ans=0.0 2023-11-26 16:58:31,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3483906.6666666665, ans=0.5 2023-11-26 16:58:41,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-26 16:58:45,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3483973.3333333335, ans=0.125 2023-11-26 16:58:50,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3484040.0, ans=0.1 2023-11-26 16:58:56,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3484040.0, ans=0.125 2023-11-26 16:59:02,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-26 16:59:07,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-26 16:59:13,978 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5600, loss[loss=0.06923, simple_loss=0.0998, pruned_loss=0.01163, audio_tagging_loss=0.007696, over 15435.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09145, pruned_loss=0.01258, audio_tagging_loss=0.008733, over 3048271.94 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:59:16,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.836e+01 9.428e+01 1.004e+02 1.214e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 16:59:17,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3484173.3333333335, ans=0.2 2023-11-26 16:59:25,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3484240.0, ans=0.2 2023-11-26 16:59:26,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-26 16:59:29,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3484240.0, ans=0.125 2023-11-26 16:59:38,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-26 16:59:46,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3484306.6666666665, ans=0.125 2023-11-26 16:59:56,681 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:59:56,928 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:00:09,441 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5650, loss[loss=0.06978, simple_loss=0.09064, pruned_loss=0.01597, audio_tagging_loss=0.008496, over 15295.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09108, pruned_loss=0.01242, audio_tagging_loss=0.008808, over 3052254.18 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:00:10,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3484506.6666666665, ans=0.0 2023-11-26 17:00:34,595 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-26 17:00:46,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3484706.6666666665, ans=0.1 2023-11-26 17:00:57,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3484773.3333333335, ans=0.0 2023-11-26 17:01:01,194 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:01:02,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3484773.3333333335, ans=0.125 2023-11-26 17:01:05,349 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5700, loss[loss=0.05577, simple_loss=0.08229, pruned_loss=0.006158, audio_tagging_loss=0.008466, over 16072.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09027, pruned_loss=0.01237, audio_tagging_loss=0.008823, over 3053658.55 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:01:08,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.017e+01 9.014e+01 9.489e+01 1.005e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 17:01:19,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3484906.6666666665, ans=0.125 2023-11-26 17:01:20,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3484906.6666666665, ans=0.0 2023-11-26 17:01:24,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3484906.6666666665, ans=0.125 2023-11-26 17:01:25,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3484906.6666666665, ans=0.0 2023-11-26 17:01:29,926 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-26 17:01:52,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.31 vs. limit=10.0 2023-11-26 17:01:58,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-11-26 17:02:00,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3485173.3333333335, ans=0.125 2023-11-26 17:02:01,805 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5750, loss[loss=0.05721, simple_loss=0.07861, pruned_loss=0.01016, audio_tagging_loss=0.007746, over 15956.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09072, pruned_loss=0.01233, audio_tagging_loss=0.008737, over 3061410.71 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:19,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=12.0 2023-11-26 17:02:25,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-26 17:02:46,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-26 17:02:50,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-26 17:02:57,296 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5800, loss[loss=0.05447, simple_loss=0.07037, pruned_loss=0.01192, audio_tagging_loss=0.007364, over 13502.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09079, pruned_loss=0.01245, audio_tagging_loss=0.008615, over 3060518.62 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:59,435 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.825e+01 9.413e+01 1.036e+02 1.628e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 17:03:23,250 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-26 17:03:44,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=15.0 2023-11-26 17:03:46,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3485773.3333333335, ans=0.1 2023-11-26 17:03:52,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3485840.0, ans=0.1 2023-11-26 17:03:53,387 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5850, loss[loss=0.06592, simple_loss=0.08944, pruned_loss=0.01319, audio_tagging_loss=0.008015, over 16514.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08996, pruned_loss=0.01231, audio_tagging_loss=0.008609, over 3065676.54 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:04:12,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3485906.6666666665, ans=0.0 2023-11-26 17:04:18,737 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-26 17:04:29,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3486040.0, ans=0.125 2023-11-26 17:04:45,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3486106.6666666665, ans=0.0 2023-11-26 17:04:49,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-26 17:04:50,434 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5900, loss[loss=0.07024, simple_loss=0.0934, pruned_loss=0.01415, audio_tagging_loss=0.009393, over 14525.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08982, pruned_loss=0.0122, audio_tagging_loss=0.008609, over 3069086.01 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:04:52,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.809e+01 9.343e+01 1.010e+02 1.341e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:04:55,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2023-11-26 17:05:05,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3486240.0, ans=0.125 2023-11-26 17:05:13,765 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-26 17:05:17,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3486306.6666666665, ans=10.0 2023-11-26 17:05:18,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3486306.6666666665, ans=0.025 2023-11-26 17:05:18,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3486306.6666666665, ans=0.0 2023-11-26 17:05:26,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-26 17:05:30,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=12.0 2023-11-26 17:05:37,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3486440.0, ans=0.05 2023-11-26 17:05:45,290 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5950, loss[loss=0.06227, simple_loss=0.08582, pruned_loss=0.01335, audio_tagging_loss=0.006012, over 16253.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08943, pruned_loss=0.01226, audio_tagging_loss=0.008625, over 3072434.14 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:06:00,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2023-11-26 17:06:02,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2023-11-26 17:06:10,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-26 17:06:11,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3486640.0, ans=0.125 2023-11-26 17:06:12,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3486640.0, ans=0.1 2023-11-26 17:06:13,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-26 17:06:17,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3486640.0, ans=0.2 2023-11-26 17:06:22,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3486706.6666666665, ans=0.125 2023-11-26 17:06:26,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3486706.6666666665, ans=0.1 2023-11-26 17:06:29,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3486773.3333333335, ans=0.1 2023-11-26 17:06:38,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3486773.3333333335, ans=0.125 2023-11-26 17:06:40,784 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6000, loss[loss=0.0474, simple_loss=0.06554, pruned_loss=0.00532, audio_tagging_loss=0.009307, over 14657.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08951, pruned_loss=0.01225, audio_tagging_loss=0.008594, over 3073416.86 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:06:40,784 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 17:07:05,491 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3406, 5.0170, 4.6729, 5.1810], device='cuda:1') 2023-11-26 17:07:13,757 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05792, simple_loss=0.05061, pruned_loss=0.005328, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-26 17:07:13,758 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 17:07:16,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.949e+01 9.418e+01 1.019e+02 1.469e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:07:20,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3486840.0, ans=0.125 2023-11-26 17:07:34,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3486973.3333333335, ans=0.02 2023-11-26 17:07:37,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-26 17:07:56,224 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:08:00,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3487106.6666666665, ans=0.05 2023-11-26 17:08:04,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3487106.6666666665, ans=0.125 2023-11-26 17:08:08,761 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6050, loss[loss=0.07939, simple_loss=0.111, pruned_loss=0.01689, audio_tagging_loss=0.006998, over 14899.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0898, pruned_loss=0.01221, audio_tagging_loss=0.008527, over 3073384.66 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:08:14,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3487173.3333333335, ans=0.0 2023-11-26 17:08:15,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3487173.3333333335, ans=0.07 2023-11-26 17:08:22,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3487240.0, ans=0.2 2023-11-26 17:08:22,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3487240.0, ans=0.07 2023-11-26 17:08:31,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3487306.6666666665, ans=0.125 2023-11-26 17:08:33,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-26 17:08:57,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-11-26 17:09:04,402 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6100, loss[loss=0.08551, simple_loss=0.1156, pruned_loss=0.01805, audio_tagging_loss=0.009671, over 15221.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09047, pruned_loss=0.01234, audio_tagging_loss=0.008468, over 3071537.92 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:09:08,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.667e+01 9.163e+01 9.920e+01 1.251e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 17:09:13,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3487506.6666666665, ans=0.125 2023-11-26 17:09:29,477 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-26 17:10:00,750 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6150, loss[loss=0.05991, simple_loss=0.08776, pruned_loss=0.009074, audio_tagging_loss=0.006951, over 15961.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09049, pruned_loss=0.01233, audio_tagging_loss=0.008552, over 3063131.33 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:02,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3487840.0, ans=10.0 2023-11-26 17:10:04,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3487840.0, ans=0.0 2023-11-26 17:10:05,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3487840.0, ans=0.125 2023-11-26 17:10:05,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3487840.0, ans=0.2 2023-11-26 17:10:21,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-26 17:10:24,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-26 17:10:35,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3488040.0, ans=0.0 2023-11-26 17:10:40,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3488040.0, ans=0.04949747468305833 2023-11-26 17:10:52,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3488106.6666666665, ans=0.125 2023-11-26 17:10:52,566 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:10:56,656 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6200, loss[loss=0.0586, simple_loss=0.08047, pruned_loss=0.01031, audio_tagging_loss=0.008063, over 16464.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08975, pruned_loss=0.01224, audio_tagging_loss=0.008647, over 3057472.13 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:59,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3488173.3333333335, ans=0.1 2023-11-26 17:10:59,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.701e+01 9.346e+01 1.022e+02 1.320e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:11:06,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3488240.0, ans=0.0 2023-11-26 17:11:07,656 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:11:15,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3488240.0, ans=0.0 2023-11-26 17:11:22,130 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-26 17:11:39,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3488373.3333333335, ans=0.2 2023-11-26 17:11:44,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3488440.0, ans=0.0 2023-11-26 17:11:46,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3488440.0, ans=0.2 2023-11-26 17:11:52,832 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6250, loss[loss=0.06645, simple_loss=0.09089, pruned_loss=0.01196, audio_tagging_loss=0.009053, over 14801.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08954, pruned_loss=0.01219, audio_tagging_loss=0.008799, over 3055869.42 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:12:09,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3488573.3333333335, ans=15.0 2023-11-26 17:12:12,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3488573.3333333335, ans=0.0 2023-11-26 17:12:17,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-26 17:12:18,190 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:12:47,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3488773.3333333335, ans=0.125 2023-11-26 17:12:49,234 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6300, loss[loss=0.07573, simple_loss=0.09135, pruned_loss=0.01624, audio_tagging_loss=0.01382, over 15374.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09019, pruned_loss=0.0122, audio_tagging_loss=0.008823, over 3049342.06 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:12:51,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3488840.0, ans=0.0 2023-11-26 17:12:52,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.680e+01 9.348e+01 1.004e+02 1.184e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 17:13:13,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-26 17:13:19,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-26 17:13:19,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.84 vs. limit=15.0 2023-11-26 17:13:21,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3489040.0, ans=0.2 2023-11-26 17:13:44,474 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6350, loss[loss=0.06918, simple_loss=0.09798, pruned_loss=0.01079, audio_tagging_loss=0.009402, over 15347.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09055, pruned_loss=0.01237, audio_tagging_loss=0.00875, over 3048784.17 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:14:00,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3489240.0, ans=0.125 2023-11-26 17:14:09,536 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-26 17:14:31,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3489440.0, ans=0.0 2023-11-26 17:14:40,443 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6400, loss[loss=0.06263, simple_loss=0.08855, pruned_loss=0.009127, audio_tagging_loss=0.00923, over 14690.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09031, pruned_loss=0.01224, audio_tagging_loss=0.008821, over 3043675.30 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:14:45,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.723e+01 9.338e+01 1.021e+02 1.186e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 17:14:45,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489506.6666666665, ans=0.1 2023-11-26 17:14:49,137 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:15:01,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3489573.3333333335, ans=0.0 2023-11-26 17:15:05,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-26 17:15:11,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3489640.0, ans=0.125 2023-11-26 17:15:14,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3489706.6666666665, ans=0.07 2023-11-26 17:15:14,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3489706.6666666665, ans=0.1 2023-11-26 17:15:31,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3489773.3333333335, ans=0.125 2023-11-26 17:15:37,386 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6450, loss[loss=0.0786, simple_loss=0.1185, pruned_loss=0.01554, audio_tagging_loss=0.003787, over 15851.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09121, pruned_loss=0.01237, audio_tagging_loss=0.008742, over 3047782.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:15:41,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-26 17:16:01,363 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-26 17:16:13,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3490040.0, ans=0.125 2023-11-26 17:16:16,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3490040.0, ans=0.0 2023-11-26 17:16:26,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3490106.6666666665, ans=0.125 2023-11-26 17:16:26,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3490106.6666666665, ans=0.0 2023-11-26 17:16:32,625 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6500, loss[loss=0.07609, simple_loss=0.1035, pruned_loss=0.01388, audio_tagging_loss=0.01046, over 15015.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09041, pruned_loss=0.01239, audio_tagging_loss=0.008771, over 3049238.87 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:16:33,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-26 17:16:37,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.749e+01 9.498e+01 1.005e+02 1.590e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 17:16:41,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-26 17:16:57,451 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-26 17:16:59,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-26 17:17:26,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3490440.0, ans=0.2 2023-11-26 17:17:28,339 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6550, loss[loss=0.05508, simple_loss=0.07638, pruned_loss=0.009591, audio_tagging_loss=0.007294, over 15169.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09055, pruned_loss=0.01225, audio_tagging_loss=0.008622, over 3049214.51 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:17:37,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=12.0 2023-11-26 17:17:44,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3490573.3333333335, ans=0.0 2023-11-26 17:17:53,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-26 17:18:23,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3490840.0, ans=0.0 2023-11-26 17:18:24,784 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6600, loss[loss=0.05876, simple_loss=0.08559, pruned_loss=0.007422, audio_tagging_loss=0.008544, over 15037.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08961, pruned_loss=0.01213, audio_tagging_loss=0.008585, over 3047240.99 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:18:30,611 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.733e+01 9.478e+01 1.039e+02 1.396e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:18:44,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3490906.6666666665, ans=0.035 2023-11-26 17:18:48,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-26 17:18:54,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-26 17:19:03,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3491040.0, ans=0.1 2023-11-26 17:19:11,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2023-11-26 17:19:19,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-26 17:19:20,545 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6650, loss[loss=0.04693, simple_loss=0.05869, pruned_loss=0.008214, audio_tagging_loss=0.00937, over 14418.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08925, pruned_loss=0.01221, audio_tagging_loss=0.008521, over 3049957.76 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:19:22,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-26 17:19:32,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3491240.0, ans=0.125 2023-11-26 17:19:38,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-26 17:19:45,630 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-26 17:20:00,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3491373.3333333335, ans=0.125 2023-11-26 17:20:03,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3491373.3333333335, ans=0.2 2023-11-26 17:20:08,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3491440.0, ans=6.0 2023-11-26 17:20:15,805 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6700, loss[loss=0.07309, simple_loss=0.1007, pruned_loss=0.01359, audio_tagging_loss=0.009167, over 14688.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08982, pruned_loss=0.01227, audio_tagging_loss=0.008472, over 3047163.62 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:20:19,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3491506.6666666665, ans=0.2 2023-11-26 17:20:21,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.660e+01 9.419e+01 1.025e+02 1.437e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:20:32,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3491573.3333333335, ans=0.0 2023-11-26 17:20:40,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3491640.0, ans=0.1 2023-11-26 17:20:41,454 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-26 17:21:12,553 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6750, loss[loss=0.06001, simple_loss=0.07966, pruned_loss=0.01204, audio_tagging_loss=0.008142, over 14988.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08971, pruned_loss=0.01229, audio_tagging_loss=0.008544, over 3044798.53 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:21:29,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3491906.6666666665, ans=0.125 2023-11-26 17:21:31,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3491906.6666666665, ans=0.125 2023-11-26 17:21:36,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-26 17:21:36,701 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-26 17:21:52,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3492040.0, ans=0.125 2023-11-26 17:22:08,690 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6800, loss[loss=0.05852, simple_loss=0.07398, pruned_loss=0.01138, audio_tagging_loss=0.01015, over 15267.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08961, pruned_loss=0.01236, audio_tagging_loss=0.00851, over 3046184.18 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:22:12,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-11-26 17:22:13,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 8.963e+01 9.408e+01 1.006e+02 1.345e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 17:22:19,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-11-26 17:22:33,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-26 17:22:41,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3492373.3333333335, ans=0.125 2023-11-26 17:22:41,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-11-26 17:22:50,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2023-11-26 17:22:51,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3492373.3333333335, ans=0.95 2023-11-26 17:22:53,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3492440.0, ans=0.0 2023-11-26 17:22:58,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:23:03,574 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6850, loss[loss=0.06456, simple_loss=0.08987, pruned_loss=0.01379, audio_tagging_loss=0.005833, over 15131.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08996, pruned_loss=0.0124, audio_tagging_loss=0.008422, over 3047436.29 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:23:08,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3492506.6666666665, ans=0.07 2023-11-26 17:23:10,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-26 17:23:12,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-26 17:23:27,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3492640.0, ans=0.0 2023-11-26 17:23:28,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3492640.0, ans=0.125 2023-11-26 17:23:29,036 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-26 17:23:29,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3492640.0, ans=0.125 2023-11-26 17:23:44,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3492706.6666666665, ans=0.2 2023-11-26 17:23:47,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3492773.3333333335, ans=0.0 2023-11-26 17:23:59,744 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6900, loss[loss=0.05855, simple_loss=0.0792, pruned_loss=0.007779, audio_tagging_loss=0.01117, over 14842.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08883, pruned_loss=0.01199, audio_tagging_loss=0.008491, over 3045748.02 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:24:07,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.659e+01 9.332e+01 1.017e+02 1.232e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 17:24:24,250 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-26 17:24:36,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.74 vs. limit=10.0 2023-11-26 17:24:42,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3493040.0, ans=0.125 2023-11-26 17:24:44,341 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:24:54,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3493106.6666666665, ans=0.2 2023-11-26 17:24:55,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3493173.3333333335, ans=0.125 2023-11-26 17:24:56,066 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6950, loss[loss=0.05497, simple_loss=0.07877, pruned_loss=0.007659, audio_tagging_loss=0.007925, over 15151.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08998, pruned_loss=0.01213, audio_tagging_loss=0.00849, over 3046774.18 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:25:20,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-26 17:25:44,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3493440.0, ans=0.125 2023-11-26 17:25:50,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3493440.0, ans=0.1 2023-11-26 17:25:53,705 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7000, loss[loss=0.05367, simple_loss=0.06999, pruned_loss=0.01052, audio_tagging_loss=0.008153, over 15631.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08931, pruned_loss=0.01206, audio_tagging_loss=0.008664, over 3048903.48 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:26:00,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.888e+01 9.554e+01 1.025e+02 1.624e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 17:26:01,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3493506.6666666665, ans=0.0 2023-11-26 17:26:19,195 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-26 17:26:22,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3493640.0, ans=0.95 2023-11-26 17:26:24,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3493640.0, ans=0.125 2023-11-26 17:26:35,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3493706.6666666665, ans=0.125 2023-11-26 17:26:38,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3493773.3333333335, ans=0.1 2023-11-26 17:26:44,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2023-11-26 17:26:47,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3493773.3333333335, ans=0.04949747468305833 2023-11-26 17:26:49,084 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7050, loss[loss=0.0744, simple_loss=0.1051, pruned_loss=0.01386, audio_tagging_loss=0.008012, over 15517.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08903, pruned_loss=0.01205, audio_tagging_loss=0.008767, over 3042210.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:27:14,144 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-26 17:27:20,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3493973.3333333335, ans=0.125 2023-11-26 17:27:23,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3494040.0, ans=0.125 2023-11-26 17:27:31,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-26 17:27:37,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2023-11-26 17:27:46,080 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7100, loss[loss=0.05169, simple_loss=0.06074, pruned_loss=0.008564, audio_tagging_loss=0.01275, over 14371.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08794, pruned_loss=0.0119, audio_tagging_loss=0.008908, over 3040545.39 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:27:52,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.801e+01 9.665e+01 1.022e+02 1.314e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 17:27:55,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3494240.0, ans=0.125 2023-11-26 17:28:09,365 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-26 17:28:22,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3494373.3333333335, ans=0.125 2023-11-26 17:28:23,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3494373.3333333335, ans=0.125 2023-11-26 17:28:24,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2023-11-26 17:28:25,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3494373.3333333335, ans=0.2 2023-11-26 17:28:38,682 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:28:40,570 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7150, loss[loss=0.06486, simple_loss=0.09156, pruned_loss=0.008878, audio_tagging_loss=0.0102, over 15730.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08966, pruned_loss=0.01225, audio_tagging_loss=0.008951, over 3047715.13 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:28:45,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3494506.6666666665, ans=0.2 2023-11-26 17:28:49,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3494506.6666666665, ans=0.125 2023-11-26 17:29:05,539 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-26 17:29:16,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3494706.6666666665, ans=0.125 2023-11-26 17:29:31,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3494773.3333333335, ans=0.125 2023-11-26 17:29:36,132 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7200, loss[loss=0.0783, simple_loss=0.1146, pruned_loss=0.01535, audio_tagging_loss=0.00565, over 14801.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08971, pruned_loss=0.01221, audio_tagging_loss=0.008953, over 3049010.01 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:29:43,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.953e+01 9.579e+01 1.038e+02 1.531e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 17:29:45,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3494840.0, ans=0.07 2023-11-26 17:29:52,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3494906.6666666665, ans=0.1 2023-11-26 17:30:01,768 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-26 17:30:05,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3494973.3333333335, ans=0.125 2023-11-26 17:30:06,251 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:30:31,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3495173.3333333335, ans=0.125 2023-11-26 17:30:32,733 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7250, loss[loss=0.07418, simple_loss=0.111, pruned_loss=0.01419, audio_tagging_loss=0.004477, over 15489.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09036, pruned_loss=0.01235, audio_tagging_loss=0.008882, over 3051781.68 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:30:44,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3495240.0, ans=0.2 2023-11-26 17:30:48,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3495240.0, ans=0.1 2023-11-26 17:30:53,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3495306.6666666665, ans=0.125 2023-11-26 17:30:56,622 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-26 17:31:00,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3495306.6666666665, ans=0.2 2023-11-26 17:31:14,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3495373.3333333335, ans=0.2 2023-11-26 17:31:16,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3495440.0, ans=0.125 2023-11-26 17:31:17,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2023-11-26 17:31:28,172 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7300, loss[loss=0.05614, simple_loss=0.08237, pruned_loss=0.006776, audio_tagging_loss=0.008182, over 14333.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09035, pruned_loss=0.01229, audio_tagging_loss=0.008822, over 3049251.42 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:31:34,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3495506.6666666665, ans=0.0 2023-11-26 17:31:35,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.863e+01 9.398e+01 1.006e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 17:31:52,478 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-26 17:31:53,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-26 17:32:05,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3495706.6666666665, ans=0.2 2023-11-26 17:32:23,233 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7350, loss[loss=0.06067, simple_loss=0.07528, pruned_loss=0.0134, audio_tagging_loss=0.009637, over 15143.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09065, pruned_loss=0.0124, audio_tagging_loss=0.008611, over 3053935.26 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:32:29,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3495840.0, ans=0.1 2023-11-26 17:32:41,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3495906.6666666665, ans=0.125 2023-11-26 17:32:42,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2023-11-26 17:32:48,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-26 17:33:08,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3496106.6666666665, ans=0.0 2023-11-26 17:33:20,280 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7400, loss[loss=0.07732, simple_loss=0.1011, pruned_loss=0.01698, audio_tagging_loss=0.009807, over 14979.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09052, pruned_loss=0.01233, audio_tagging_loss=0.00852, over 3048833.05 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:33:27,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 8.991e+01 9.600e+01 1.026e+02 1.969e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 17:33:31,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2023-11-26 17:33:44,207 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-26 17:33:55,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3496373.3333333335, ans=0.125 2023-11-26 17:33:58,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-26 17:34:00,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3496373.3333333335, ans=0.0 2023-11-26 17:34:15,334 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7450, loss[loss=0.08237, simple_loss=0.112, pruned_loss=0.0165, audio_tagging_loss=0.009888, over 14987.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09046, pruned_loss=0.01225, audio_tagging_loss=0.008481, over 3041146.47 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:34:25,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3496573.3333333335, ans=0.0 2023-11-26 17:34:40,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-26 17:34:57,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3496706.6666666665, ans=0.125 2023-11-26 17:35:01,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3496773.3333333335, ans=0.125 2023-11-26 17:35:03,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3496773.3333333335, ans=0.1 2023-11-26 17:35:11,138 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7500, loss[loss=0.07034, simple_loss=0.0995, pruned_loss=0.01369, audio_tagging_loss=0.0069, over 16061.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0899, pruned_loss=0.01231, audio_tagging_loss=0.008512, over 3051165.18 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:35:13,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3496840.0, ans=0.1 2023-11-26 17:35:15,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3496840.0, ans=0.1 2023-11-26 17:35:17,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-26 17:35:19,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.894e+01 9.302e+01 1.027e+02 1.348e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-26 17:35:27,877 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:35:32,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3496906.6666666665, ans=0.2 2023-11-26 17:35:33,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3496973.3333333335, ans=0.2 2023-11-26 17:35:36,270 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-26 17:35:54,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2023-11-26 17:35:56,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497106.6666666665, ans=0.1 2023-11-26 17:36:07,374 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7550, loss[loss=0.06028, simple_loss=0.07716, pruned_loss=0.01162, audio_tagging_loss=0.01008, over 16453.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08947, pruned_loss=0.01225, audio_tagging_loss=0.008527, over 3056303.37 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:36:18,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3497240.0, ans=0.0 2023-11-26 17:36:26,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3497240.0, ans=0.0 2023-11-26 17:36:32,055 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-26 17:36:40,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3497373.3333333335, ans=0.125 2023-11-26 17:36:46,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-26 17:36:47,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2023-11-26 17:36:53,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3497440.0, ans=0.125 2023-11-26 17:36:58,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3497440.0, ans=0.1 2023-11-26 17:37:03,388 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7600, loss[loss=0.06727, simple_loss=0.08857, pruned_loss=0.01351, audio_tagging_loss=0.00947, over 14751.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08893, pruned_loss=0.01214, audio_tagging_loss=0.00856, over 3051765.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:37:04,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3497506.6666666665, ans=0.5 2023-11-26 17:37:05,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3497506.6666666665, ans=0.0 2023-11-26 17:37:08,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-26 17:37:10,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.779e+01 9.690e+01 1.052e+02 1.195e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 17:37:11,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497506.6666666665, ans=0.1 2023-11-26 17:37:27,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-26 17:37:58,863 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7650, loss[loss=0.05519, simple_loss=0.08083, pruned_loss=0.005829, audio_tagging_loss=0.008944, over 15076.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08902, pruned_loss=0.0122, audio_tagging_loss=0.008538, over 3049774.96 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:38:07,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-26 17:38:08,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-11-26 17:38:11,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3497906.6666666665, ans=0.125 2023-11-26 17:38:23,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-26 17:38:50,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3498106.6666666665, ans=0.0 2023-11-26 17:38:52,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3498106.6666666665, ans=0.0 2023-11-26 17:38:54,586 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7700, loss[loss=0.0753, simple_loss=0.1048, pruned_loss=0.01411, audio_tagging_loss=0.008809, over 15019.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09008, pruned_loss=0.0123, audio_tagging_loss=0.008511, over 3045772.71 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:39:02,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.021e+01 9.589e+01 1.019e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 17:39:11,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3498240.0, ans=0.125 2023-11-26 17:39:14,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3498240.0, ans=0.125 2023-11-26 17:39:18,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-26 17:39:24,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3498306.6666666665, ans=0.125 2023-11-26 17:39:28,971 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:39:30,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3498373.3333333335, ans=0.0 2023-11-26 17:39:33,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-26 17:39:34,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-11-26 17:39:35,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-26 17:39:38,607 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:39:39,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3498440.0, ans=0.0 2023-11-26 17:39:50,745 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7750, loss[loss=0.05845, simple_loss=0.07451, pruned_loss=0.01108, audio_tagging_loss=0.01012, over 14745.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09012, pruned_loss=0.01229, audio_tagging_loss=0.008592, over 3043783.14 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:39:58,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498506.6666666665, ans=0.1 2023-11-26 17:39:59,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3498506.6666666665, ans=0.125 2023-11-26 17:40:02,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-11-26 17:40:10,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3498573.3333333335, ans=0.95 2023-11-26 17:40:10,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3498573.3333333335, ans=0.025 2023-11-26 17:40:14,460 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-26 17:40:14,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3498640.0, ans=0.04949747468305833 2023-11-26 17:40:29,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3498706.6666666665, ans=0.125 2023-11-26 17:40:31,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498706.6666666665, ans=0.1 2023-11-26 17:40:45,533 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7800, loss[loss=0.0563, simple_loss=0.07245, pruned_loss=0.01055, audio_tagging_loss=0.009527, over 14901.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09032, pruned_loss=0.01226, audio_tagging_loss=0.008652, over 3044228.96 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:40:49,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498840.0, ans=0.1 2023-11-26 17:40:54,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.862e+01 9.365e+01 1.011e+02 1.202e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:41:02,139 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:41:04,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3498906.6666666665, ans=0.0 2023-11-26 17:41:11,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-26 17:41:31,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3499106.6666666665, ans=0.125 2023-11-26 17:41:41,888 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7850, loss[loss=0.04922, simple_loss=0.06364, pruned_loss=0.007131, audio_tagging_loss=0.01027, over 15290.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09125, pruned_loss=0.0125, audio_tagging_loss=0.008658, over 3040252.68 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:41:46,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3499173.3333333335, ans=0.0 2023-11-26 17:41:53,795 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:42:06,276 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-26 17:42:29,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3499440.0, ans=0.0 2023-11-26 17:42:33,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3499440.0, ans=0.025 2023-11-26 17:42:37,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499506.6666666665, ans=0.1 2023-11-26 17:42:38,161 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7900, loss[loss=0.04842, simple_loss=0.06924, pruned_loss=0.00657, audio_tagging_loss=0.007234, over 15563.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.091, pruned_loss=0.01259, audio_tagging_loss=0.008672, over 3036816.20 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:42:40,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3499506.6666666665, ans=0.0 2023-11-26 17:42:46,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 9.062e+01 9.709e+01 1.039e+02 1.444e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 17:42:57,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3499573.3333333335, ans=0.0 2023-11-26 17:42:58,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3499640.0, ans=0.2 2023-11-26 17:43:01,424 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-26 17:43:05,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3499640.0, ans=0.2 2023-11-26 17:43:28,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3499773.3333333335, ans=0.0 2023-11-26 17:43:32,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3499840.0, ans=0.125 2023-11-26 17:43:33,419 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7950, loss[loss=0.07281, simple_loss=0.09733, pruned_loss=0.01572, audio_tagging_loss=0.008425, over 15213.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.0905, pruned_loss=0.01233, audio_tagging_loss=0.008831, over 3047335.65 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:43:45,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3499906.6666666665, ans=0.125 2023-11-26 17:43:50,574 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:43:53,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3499906.6666666665, ans=0.125 2023-11-26 17:43:55,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3499973.3333333335, ans=0.1 2023-11-26 17:43:59,132 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-26 17:44:05,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3499973.3333333335, ans=0.125 2023-11-26 17:44:28,916 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8000, loss[loss=0.07056, simple_loss=0.09289, pruned_loss=0.01434, audio_tagging_loss=0.009777, over 17052.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08961, pruned_loss=0.01225, audio_tagging_loss=0.008898, over 3045126.26 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:44:29,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2023-11-26 17:44:37,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3500173.3333333335, ans=0.125 2023-11-26 17:44:38,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.982e+01 9.655e+01 1.047e+02 1.497e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 17:44:54,214 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-26 17:44:56,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3500306.6666666665, ans=0.125 2023-11-26 17:44:57,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3500306.6666666665, ans=0.1 2023-11-26 17:45:14,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3500440.0, ans=0.1 2023-11-26 17:45:25,875 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8050, loss[loss=0.07206, simple_loss=0.09182, pruned_loss=0.01614, audio_tagging_loss=0.01002, over 16263.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08916, pruned_loss=0.01232, audio_tagging_loss=0.008948, over 3045357.52 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:45:26,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3500506.6666666665, ans=0.0 2023-11-26 17:45:28,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-26 17:45:28,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-26 17:45:42,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-11-26 17:45:46,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-26 17:45:49,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-26 17:45:49,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3500640.0, ans=0.125 2023-11-26 17:46:16,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3500773.3333333335, ans=0.125 2023-11-26 17:46:21,178 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8100, loss[loss=0.06085, simple_loss=0.08491, pruned_loss=0.01013, audio_tagging_loss=0.008257, over 16228.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09024, pruned_loss=0.01263, audio_tagging_loss=0.008791, over 3042615.16 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:46:29,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.964e+01 9.424e+01 1.017e+02 1.199e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 17:46:42,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3500973.3333333335, ans=0.07 2023-11-26 17:46:45,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-26 17:46:54,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3501040.0, ans=0.0 2023-11-26 17:46:58,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3501040.0, ans=0.0 2023-11-26 17:47:16,043 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8150, loss[loss=0.06853, simple_loss=0.09152, pruned_loss=0.01359, audio_tagging_loss=0.009179, over 16430.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08994, pruned_loss=0.01254, audio_tagging_loss=0.008649, over 3041124.85 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:47:19,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3501173.3333333335, ans=0.125 2023-11-26 17:47:32,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3501240.0, ans=0.125 2023-11-26 17:47:32,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3501240.0, ans=0.125 2023-11-26 17:47:35,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3501240.0, ans=0.125 2023-11-26 17:47:37,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3501240.0, ans=0.0 2023-11-26 17:47:41,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-26 17:47:42,745 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:47:44,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-26 17:47:50,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3501373.3333333335, ans=0.125 2023-11-26 17:48:10,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3501440.0, ans=0.125 2023-11-26 17:48:12,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3501506.6666666665, ans=0.125 2023-11-26 17:48:13,424 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8200, loss[loss=0.08355, simple_loss=0.1151, pruned_loss=0.02016, audio_tagging_loss=0.005851, over 15716.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09028, pruned_loss=0.01255, audio_tagging_loss=0.008484, over 3049624.32 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:48:16,606 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:48:20,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-11-26 17:48:22,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.766e+01 9.406e+01 1.001e+02 1.239e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 17:48:29,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3501573.3333333335, ans=0.125 2023-11-26 17:48:36,854 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-26 17:48:55,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3501706.6666666665, ans=0.035 2023-11-26 17:49:04,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3501773.3333333335, ans=0.0 2023-11-26 17:49:08,678 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8250, loss[loss=0.04161, simple_loss=0.05656, pruned_loss=0.003124, audio_tagging_loss=0.01021, over 14986.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09066, pruned_loss=0.0125, audio_tagging_loss=0.008477, over 3039345.33 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:49:18,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3501906.6666666665, ans=0.125 2023-11-26 17:49:23,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3501906.6666666665, ans=0.125 2023-11-26 17:49:25,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3501906.6666666665, ans=0.125 2023-11-26 17:49:33,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-26 17:49:39,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-26 17:49:39,572 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:49:43,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3502040.0, ans=0.0 2023-11-26 17:50:01,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3502106.6666666665, ans=0.0 2023-11-26 17:50:03,799 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8300, loss[loss=0.08265, simple_loss=0.1156, pruned_loss=0.01783, audio_tagging_loss=0.007016, over 15198.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09055, pruned_loss=0.01221, audio_tagging_loss=0.008504, over 3048198.34 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:50:14,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.963e+01 9.594e+01 1.033e+02 1.265e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 17:50:19,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3502240.0, ans=0.125 2023-11-26 17:50:28,598 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-26 17:50:29,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3502306.6666666665, ans=0.1 2023-11-26 17:50:56,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3502440.0, ans=0.0 2023-11-26 17:50:59,499 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8350, loss[loss=0.07824, simple_loss=0.1114, pruned_loss=0.01836, audio_tagging_loss=0.004189, over 16017.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.09011, pruned_loss=0.012, audio_tagging_loss=0.008572, over 3048184.65 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:51:24,024 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-26 17:51:27,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3502640.0, ans=0.125 2023-11-26 17:51:31,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3502706.6666666665, ans=0.0 2023-11-26 17:51:55,995 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8400, loss[loss=0.06317, simple_loss=0.08428, pruned_loss=0.0129, audio_tagging_loss=0.008137, over 15068.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.0903, pruned_loss=0.01222, audio_tagging_loss=0.008523, over 3045875.81 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:52:02,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3502840.0, ans=0.1 2023-11-26 17:52:05,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.821e+01 9.539e+01 1.045e+02 1.757e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 17:52:18,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-11-26 17:52:19,653 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-26 17:52:50,789 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8450, loss[loss=0.046, simple_loss=0.06079, pruned_loss=0.006581, audio_tagging_loss=0.009019, over 14956.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08996, pruned_loss=0.01226, audio_tagging_loss=0.008617, over 3050413.14 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:52:51,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3503173.3333333335, ans=0.2 2023-11-26 17:53:16,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-26 17:53:27,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3503373.3333333335, ans=15.0 2023-11-26 17:53:45,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3503506.6666666665, ans=0.125 2023-11-26 17:53:46,681 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8500, loss[loss=0.06417, simple_loss=0.09565, pruned_loss=0.008417, audio_tagging_loss=0.007925, over 14956.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08878, pruned_loss=0.01196, audio_tagging_loss=0.008747, over 3051263.08 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:53:58,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.914e+01 9.480e+01 1.019e+02 1.218e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:54:07,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3503640.0, ans=0.1 2023-11-26 17:54:10,885 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-26 17:54:13,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3503640.0, ans=0.125 2023-11-26 17:54:24,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3503706.6666666665, ans=0.1 2023-11-26 17:54:42,940 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8550, loss[loss=0.06281, simple_loss=0.08716, pruned_loss=0.01133, audio_tagging_loss=0.007906, over 15275.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08847, pruned_loss=0.01181, audio_tagging_loss=0.008739, over 3052443.47 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:54:43,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3503840.0, ans=0.1 2023-11-26 17:54:59,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3503906.6666666665, ans=0.2 2023-11-26 17:55:00,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3503906.6666666665, ans=0.07 2023-11-26 17:55:06,691 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-26 17:55:37,976 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8600, loss[loss=0.07459, simple_loss=0.1149, pruned_loss=0.009759, audio_tagging_loss=0.007385, over 14997.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08952, pruned_loss=0.01198, audio_tagging_loss=0.008847, over 3050677.51 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:55:46,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3504173.3333333335, ans=0.0 2023-11-26 17:55:49,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.808e+01 9.575e+01 1.022e+02 1.354e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 17:55:58,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3504240.0, ans=0.2 2023-11-26 17:56:02,971 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-26 17:56:06,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3504306.6666666665, ans=0.0 2023-11-26 17:56:10,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-26 17:56:25,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3504440.0, ans=0.0 2023-11-26 17:56:33,771 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8650, loss[loss=0.08133, simple_loss=0.12, pruned_loss=0.01418, audio_tagging_loss=0.007127, over 15001.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08994, pruned_loss=0.01209, audio_tagging_loss=0.008863, over 3048751.91 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:56:34,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2023-11-26 17:56:41,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3504506.6666666665, ans=0.95 2023-11-26 17:56:52,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.43 vs. limit=22.5 2023-11-26 17:56:58,842 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-26 17:57:01,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3504640.0, ans=0.0 2023-11-26 17:57:04,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.73 vs. limit=10.0 2023-11-26 17:57:06,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3504706.6666666665, ans=0.0 2023-11-26 17:57:14,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3504706.6666666665, ans=0.125 2023-11-26 17:57:30,011 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8700, loss[loss=0.0543, simple_loss=0.07273, pruned_loss=0.009001, audio_tagging_loss=0.008932, over 14970.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08988, pruned_loss=0.01208, audio_tagging_loss=0.008946, over 3042344.98 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:57:33,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3504840.0, ans=0.0 2023-11-26 17:57:41,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.922e+01 9.364e+01 9.934e+01 1.569e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:57:41,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3504906.6666666665, ans=0.125 2023-11-26 17:57:43,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-26 17:57:53,938 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-26 17:57:55,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2023-11-26 17:58:14,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3505106.6666666665, ans=0.2 2023-11-26 17:58:25,613 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8750, loss[loss=0.05514, simple_loss=0.0719, pruned_loss=0.009728, audio_tagging_loss=0.009463, over 14074.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09029, pruned_loss=0.01228, audio_tagging_loss=0.009029, over 3043426.71 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:58:33,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3505173.3333333335, ans=0.125 2023-11-26 17:58:50,424 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-26 17:58:56,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3505306.6666666665, ans=0.1 2023-11-26 17:59:12,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3505440.0, ans=0.125 2023-11-26 17:59:17,834 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:59:21,328 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8800, loss[loss=0.04545, simple_loss=0.05851, pruned_loss=0.005644, audio_tagging_loss=0.01055, over 13727.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09006, pruned_loss=0.01218, audio_tagging_loss=0.009178, over 3045631.02 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:59:28,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3505506.6666666665, ans=0.125 2023-11-26 17:59:32,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.136e+01 9.670e+01 1.038e+02 1.622e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 17:59:40,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3505573.3333333335, ans=0.0 2023-11-26 17:59:40,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3505573.3333333335, ans=0.125 2023-11-26 17:59:43,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3505640.0, ans=0.04949747468305833 2023-11-26 17:59:45,711 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-26 17:59:57,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3505706.6666666665, ans=0.2 2023-11-26 18:00:06,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3505773.3333333335, ans=0.125 2023-11-26 18:00:17,406 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8850, loss[loss=0.06498, simple_loss=0.0788, pruned_loss=0.01612, audio_tagging_loss=0.009454, over 15105.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09006, pruned_loss=0.01214, audio_tagging_loss=0.009121, over 3048550.11 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:00:30,102 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:00:39,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3505973.3333333335, ans=0.0 2023-11-26 18:00:41,268 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-26 18:00:48,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3505973.3333333335, ans=0.125 2023-11-26 18:00:50,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3506040.0, ans=0.035 2023-11-26 18:00:52,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2023-11-26 18:00:55,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3506040.0, ans=0.0 2023-11-26 18:01:02,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3506106.6666666665, ans=0.125 2023-11-26 18:01:08,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3506106.6666666665, ans=0.0 2023-11-26 18:01:12,214 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8900, loss[loss=0.06648, simple_loss=0.09531, pruned_loss=0.01245, audio_tagging_loss=0.00637, over 15072.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08982, pruned_loss=0.01212, audio_tagging_loss=0.008982, over 3049161.30 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:01:16,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-11-26 18:01:19,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506173.3333333335, ans=0.125 2023-11-26 18:01:24,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.724e+01 9.341e+01 9.971e+01 1.167e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:01:37,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-26 18:01:43,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3506306.6666666665, ans=0.125 2023-11-26 18:01:51,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3506373.3333333335, ans=0.125 2023-11-26 18:01:52,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.78 vs. limit=22.5 2023-11-26 18:02:03,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3506440.0, ans=0.125 2023-11-26 18:02:07,767 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8950, loss[loss=0.08001, simple_loss=0.1181, pruned_loss=0.01509, audio_tagging_loss=0.005876, over 16217.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09054, pruned_loss=0.01215, audio_tagging_loss=0.008799, over 3060178.97 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:02:17,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3506506.6666666665, ans=0.1 2023-11-26 18:02:28,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2023-11-26 18:02:32,640 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-26 18:03:01,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3506773.3333333335, ans=0.0 2023-11-26 18:03:04,594 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9000, loss[loss=0.08433, simple_loss=0.1168, pruned_loss=0.01767, audio_tagging_loss=0.008262, over 15493.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09073, pruned_loss=0.01227, audio_tagging_loss=0.0087, over 3061363.10 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:03:04,595 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 18:03:23,575 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8111, 4.9840, 5.0989, 4.9378], device='cuda:1') 2023-11-26 18:03:27,510 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9452, 2.9249, 2.7754, 2.6800, 3.3665, 3.3369, 3.1552, 3.5875], device='cuda:1') 2023-11-26 18:03:36,861 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05857, simple_loss=0.05054, pruned_loss=0.005271, audio_tagging_loss=0.02803, over 4681554.00 frames. 2023-11-26 18:03:36,862 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 18:03:38,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3506840.0, ans=0.09899494936611666 2023-11-26 18:03:47,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506906.6666666665, ans=0.125 2023-11-26 18:03:50,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 9.015e+01 9.647e+01 1.018e+02 1.400e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 18:04:02,236 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-26 18:04:02,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3506973.3333333335, ans=0.0 2023-11-26 18:04:08,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3506973.3333333335, ans=0.2 2023-11-26 18:04:25,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3507106.6666666665, ans=0.0 2023-11-26 18:04:32,807 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9050, loss[loss=0.06369, simple_loss=0.09004, pruned_loss=0.01165, audio_tagging_loss=0.007013, over 15148.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09066, pruned_loss=0.01237, audio_tagging_loss=0.00864, over 3059640.85 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:04:40,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3507173.3333333335, ans=0.125 2023-11-26 18:04:40,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3507173.3333333335, ans=0.0 2023-11-26 18:04:42,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2023-11-26 18:04:43,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3507240.0, ans=0.125 2023-11-26 18:04:57,318 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-26 18:05:07,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3507373.3333333335, ans=0.0 2023-11-26 18:05:10,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-11-26 18:05:15,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3507373.3333333335, ans=0.0 2023-11-26 18:05:29,287 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9100, loss[loss=0.08096, simple_loss=0.1164, pruned_loss=0.01631, audio_tagging_loss=0.006468, over 15423.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09121, pruned_loss=0.01242, audio_tagging_loss=0.008572, over 3054569.04 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:05:30,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2023-11-26 18:05:35,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3507506.6666666665, ans=0.125 2023-11-26 18:05:41,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.875e+01 9.431e+01 1.015e+02 1.268e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 18:05:43,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3507573.3333333335, ans=0.125 2023-11-26 18:05:44,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-11-26 18:05:45,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3507573.3333333335, ans=0.1 2023-11-26 18:05:53,187 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-26 18:05:55,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-11-26 18:06:00,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3507640.0, ans=0.0 2023-11-26 18:06:08,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3507706.6666666665, ans=0.125 2023-11-26 18:06:12,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3507706.6666666665, ans=0.125 2023-11-26 18:06:21,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3507773.3333333335, ans=0.125 2023-11-26 18:06:24,509 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9150, loss[loss=0.07784, simple_loss=0.1045, pruned_loss=0.02015, audio_tagging_loss=0.005434, over 15706.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09209, pruned_loss=0.01255, audio_tagging_loss=0.00844, over 3056489.31 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:06:50,122 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-26 18:07:06,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3508040.0, ans=0.125 2023-11-26 18:07:19,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3508173.3333333335, ans=0.09899494936611666 2023-11-26 18:07:20,631 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9200, loss[loss=0.05148, simple_loss=0.0683, pruned_loss=0.007682, audio_tagging_loss=0.009649, over 14859.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09174, pruned_loss=0.01257, audio_tagging_loss=0.008434, over 3054276.52 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:07:34,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.887e+01 9.428e+01 1.018e+02 1.949e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-26 18:07:38,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3508240.0, ans=0.07 2023-11-26 18:07:39,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3508240.0, ans=0.1 2023-11-26 18:07:43,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2023-11-26 18:07:45,607 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-26 18:07:50,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-11-26 18:07:51,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3508306.6666666665, ans=0.125 2023-11-26 18:07:59,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2023-11-26 18:08:17,407 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9250, loss[loss=0.07188, simple_loss=0.1016, pruned_loss=0.0132, audio_tagging_loss=0.007884, over 15193.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09024, pruned_loss=0.01242, audio_tagging_loss=0.008472, over 3058759.35 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:08:22,976 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:08:31,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-11-26 18:08:35,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3508573.3333333335, ans=0.04949747468305833 2023-11-26 18:08:40,776 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-26 18:09:08,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3508773.3333333335, ans=0.0 2023-11-26 18:09:09,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3508773.3333333335, ans=0.125 2023-11-26 18:09:12,316 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9300, loss[loss=0.06895, simple_loss=0.08841, pruned_loss=0.0164, audio_tagging_loss=0.008348, over 14344.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08958, pruned_loss=0.01237, audio_tagging_loss=0.008536, over 3058121.95 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:09:20,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2023-11-26 18:09:25,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.680e+01 9.500e+01 1.023e+02 1.279e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:09:26,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-26 18:09:27,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3508906.6666666665, ans=0.125 2023-11-26 18:09:37,817 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-26 18:09:41,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3508973.3333333335, ans=0.0 2023-11-26 18:09:51,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3509040.0, ans=0.125 2023-11-26 18:09:52,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3509040.0, ans=0.125 2023-11-26 18:10:05,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3509106.6666666665, ans=0.125 2023-11-26 18:10:07,396 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9350, loss[loss=0.06351, simple_loss=0.08737, pruned_loss=0.01117, audio_tagging_loss=0.008658, over 14805.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08931, pruned_loss=0.01218, audio_tagging_loss=0.008541, over 3051755.60 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:10:21,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3509240.0, ans=0.125 2023-11-26 18:10:33,005 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-26 18:10:35,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3509306.6666666665, ans=0.125 2023-11-26 18:11:00,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3509440.0, ans=0.125 2023-11-26 18:11:02,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.68 vs. limit=15.0 2023-11-26 18:11:04,585 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9400, loss[loss=0.05856, simple_loss=0.08053, pruned_loss=0.0117, audio_tagging_loss=0.006588, over 14710.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08957, pruned_loss=0.01228, audio_tagging_loss=0.008632, over 3052571.02 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:11:11,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3509506.6666666665, ans=0.125 2023-11-26 18:11:15,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509573.3333333335, ans=0.1 2023-11-26 18:11:17,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.876e+01 9.519e+01 1.023e+02 1.284e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 18:11:27,910 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-26 18:11:42,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3509706.6666666665, ans=0.0 2023-11-26 18:11:44,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3509706.6666666665, ans=0.0 2023-11-26 18:11:59,794 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9450, loss[loss=0.054, simple_loss=0.06589, pruned_loss=0.01018, audio_tagging_loss=0.01088, over 17110.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08886, pruned_loss=0.01217, audio_tagging_loss=0.008817, over 3053451.45 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:12:00,893 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:12:09,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509906.6666666665, ans=0.1 2023-11-26 18:12:13,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3509906.6666666665, ans=0.125 2023-11-26 18:12:24,992 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-26 18:12:31,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509973.3333333335, ans=0.1 2023-11-26 18:12:44,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3510106.6666666665, ans=0.05 2023-11-26 18:12:55,192 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9500, loss[loss=0.07769, simple_loss=0.1138, pruned_loss=0.01313, audio_tagging_loss=0.007649, over 16802.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08869, pruned_loss=0.01209, audio_tagging_loss=0.008785, over 3049465.22 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:12:59,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-26 18:13:09,443 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.837e+01 9.537e+01 1.034e+02 1.299e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 18:13:17,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3510306.6666666665, ans=0.125 2023-11-26 18:13:20,777 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-26 18:13:26,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3510306.6666666665, ans=0.125 2023-11-26 18:13:41,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3510440.0, ans=0.125 2023-11-26 18:13:51,807 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9550, loss[loss=0.07047, simple_loss=0.09623, pruned_loss=0.01536, audio_tagging_loss=0.007, over 15197.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08982, pruned_loss=0.01232, audio_tagging_loss=0.008795, over 3055527.57 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:14:12,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3510640.0, ans=0.0 2023-11-26 18:14:15,667 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-26 18:14:16,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-26 18:14:18,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3510640.0, ans=0.125 2023-11-26 18:14:19,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3510640.0, ans=0.0 2023-11-26 18:14:22,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3510640.0, ans=0.0 2023-11-26 18:14:41,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3510773.3333333335, ans=0.0 2023-11-26 18:14:47,786 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9600, loss[loss=0.05285, simple_loss=0.06872, pruned_loss=0.009306, audio_tagging_loss=0.009189, over 14579.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09078, pruned_loss=0.0124, audio_tagging_loss=0.008796, over 3056861.95 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:14:56,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-26 18:15:00,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.909e+01 9.426e+01 1.006e+02 1.618e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 18:15:00,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3510906.6666666665, ans=0.125 2023-11-26 18:15:11,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-26 18:15:25,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3511040.0, ans=0.125 2023-11-26 18:15:27,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3511040.0, ans=0.125 2023-11-26 18:15:38,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-11-26 18:15:39,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3511106.6666666665, ans=10.0 2023-11-26 18:15:40,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.71 vs. limit=10.0 2023-11-26 18:15:43,136 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9650, loss[loss=0.05195, simple_loss=0.07134, pruned_loss=0.006622, audio_tagging_loss=0.009655, over 14928.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0905, pruned_loss=0.01233, audio_tagging_loss=0.008813, over 3061728.94 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:15:57,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3511240.0, ans=0.125 2023-11-26 18:16:08,681 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-26 18:16:40,152 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9700, loss[loss=0.0701, simple_loss=0.09321, pruned_loss=0.01384, audio_tagging_loss=0.009659, over 14374.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08984, pruned_loss=0.0122, audio_tagging_loss=0.008732, over 3059355.28 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:16:54,486 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.934e+01 9.029e+01 9.553e+01 1.029e+02 1.538e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:17:01,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3511640.0, ans=0.125 2023-11-26 18:17:04,055 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-26 18:17:35,714 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9750, loss[loss=0.06299, simple_loss=0.08949, pruned_loss=0.01032, audio_tagging_loss=0.007917, over 15522.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08939, pruned_loss=0.01213, audio_tagging_loss=0.008672, over 3053880.94 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:17:39,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-11-26 18:17:45,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3511906.6666666665, ans=0.0 2023-11-26 18:17:48,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3511906.6666666665, ans=0.2 2023-11-26 18:17:51,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3511906.6666666665, ans=0.125 2023-11-26 18:17:55,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3511906.6666666665, ans=0.2 2023-11-26 18:17:58,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3511973.3333333335, ans=0.125 2023-11-26 18:17:59,901 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-26 18:18:00,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3511973.3333333335, ans=0.125 2023-11-26 18:18:04,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3511973.3333333335, ans=0.025 2023-11-26 18:18:06,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3511973.3333333335, ans=0.05 2023-11-26 18:18:21,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3512106.6666666665, ans=0.0 2023-11-26 18:18:31,592 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9800, loss[loss=0.06271, simple_loss=0.08595, pruned_loss=0.008439, audio_tagging_loss=0.01129, over 15266.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08896, pruned_loss=0.01218, audio_tagging_loss=0.00872, over 3051237.18 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:18:45,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.863e+01 9.415e+01 1.026e+02 1.443e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 18:18:54,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3512306.6666666665, ans=0.125 2023-11-26 18:18:56,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-26 18:19:00,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3512306.6666666665, ans=0.0 2023-11-26 18:19:08,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3512373.3333333335, ans=0.125 2023-11-26 18:19:12,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=22.5 2023-11-26 18:19:14,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3512373.3333333335, ans=0.125 2023-11-26 18:19:22,543 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:19:24,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-26 18:19:27,288 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9850, loss[loss=0.0621, simple_loss=0.08551, pruned_loss=0.01154, audio_tagging_loss=0.007806, over 16475.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08972, pruned_loss=0.01226, audio_tagging_loss=0.00877, over 3055243.62 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:19:27,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3512506.6666666665, ans=0.1 2023-11-26 18:19:40,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3512573.3333333335, ans=0.125 2023-11-26 18:19:48,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3512573.3333333335, ans=0.125 2023-11-26 18:19:52,224 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-26 18:19:55,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3512640.0, ans=0.2 2023-11-26 18:20:17,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3512773.3333333335, ans=0.125 2023-11-26 18:20:23,502 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9900, loss[loss=0.0615, simple_loss=0.07467, pruned_loss=0.01495, audio_tagging_loss=0.009219, over 13843.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09013, pruned_loss=0.01235, audio_tagging_loss=0.008736, over 3051187.52 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:20:25,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2023-11-26 18:20:27,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-11-26 18:20:37,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.893e+01 9.553e+01 1.033e+02 1.550e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:20:39,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-11-26 18:20:47,271 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-26 18:21:04,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3513040.0, ans=0.0 2023-11-26 18:21:19,082 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9950, loss[loss=0.05254, simple_loss=0.06756, pruned_loss=0.007829, audio_tagging_loss=0.01094, over 15804.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08991, pruned_loss=0.01233, audio_tagging_loss=0.008657, over 3052597.66 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:21:32,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3513240.0, ans=0.07 2023-11-26 18:21:44,397 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-26 18:21:50,114 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:21:54,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3513373.3333333335, ans=0.125 2023-11-26 18:22:04,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3513440.0, ans=0.125 2023-11-26 18:22:06,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3513440.0, ans=0.035 2023-11-26 18:22:06,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3513440.0, ans=0.125 2023-11-26 18:22:08,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3513440.0, ans=0.125 2023-11-26 18:22:15,650 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10000, loss[loss=0.0677, simple_loss=0.09413, pruned_loss=0.0114, audio_tagging_loss=0.00924, over 16053.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08885, pruned_loss=0.01209, audio_tagging_loss=0.00862, over 3054588.75 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:22:29,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.698e+01 9.339e+01 1.006e+02 1.184e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:22:34,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3513573.3333333335, ans=0.0 2023-11-26 18:22:39,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-26 18:22:39,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3513640.0, ans=0.125 2023-11-26 18:22:44,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3513640.0, ans=0.09899494936611666 2023-11-26 18:22:52,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3513706.6666666665, ans=0.125 2023-11-26 18:23:02,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3513773.3333333335, ans=0.07 2023-11-26 18:23:05,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3513773.3333333335, ans=0.0 2023-11-26 18:23:11,565 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10050, loss[loss=0.08791, simple_loss=0.1226, pruned_loss=0.02092, audio_tagging_loss=0.005685, over 14903.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08904, pruned_loss=0.01208, audio_tagging_loss=0.008572, over 3053488.54 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:23:12,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3513840.0, ans=0.0 2023-11-26 18:23:29,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3513906.6666666665, ans=0.125 2023-11-26 18:23:30,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3513906.6666666665, ans=0.125 2023-11-26 18:23:35,634 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-26 18:23:38,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3513973.3333333335, ans=0.125 2023-11-26 18:23:42,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-11-26 18:23:51,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3514040.0, ans=0.1 2023-11-26 18:23:52,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3514040.0, ans=0.125 2023-11-26 18:23:56,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-11-26 18:24:06,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=22.5 2023-11-26 18:24:06,824 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10100, loss[loss=0.07506, simple_loss=0.1052, pruned_loss=0.0151, audio_tagging_loss=0.007379, over 14773.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08949, pruned_loss=0.01223, audio_tagging_loss=0.008587, over 3053691.33 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:24:21,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.926e+01 9.577e+01 1.044e+02 1.166e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 18:24:32,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-26 18:24:51,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3514440.0, ans=0.0 2023-11-26 18:24:53,876 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:25:02,354 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10150, loss[loss=0.06665, simple_loss=0.08993, pruned_loss=0.009188, audio_tagging_loss=0.01249, over 16299.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08916, pruned_loss=0.0123, audio_tagging_loss=0.008645, over 3053857.06 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:25:10,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3514506.6666666665, ans=0.125 2023-11-26 18:25:13,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3514573.3333333335, ans=0.04949747468305833 2023-11-26 18:25:14,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3514573.3333333335, ans=0.125 2023-11-26 18:25:15,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3514573.3333333335, ans=0.2 2023-11-26 18:25:21,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514573.3333333335, ans=0.1 2023-11-26 18:25:26,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-26 18:25:30,459 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:25:30,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3514640.0, ans=0.125 2023-11-26 18:25:32,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3514640.0, ans=0.02 2023-11-26 18:25:44,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3514706.6666666665, ans=0.0 2023-11-26 18:25:46,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:25:58,624 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10200, loss[loss=0.05707, simple_loss=0.07859, pruned_loss=0.009363, audio_tagging_loss=0.008416, over 15248.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08907, pruned_loss=0.01218, audio_tagging_loss=0.008699, over 3064594.58 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:01,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3514840.0, ans=0.02 2023-11-26 18:26:13,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 9.101e+01 9.803e+01 1.043e+02 1.180e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-26 18:26:19,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3514973.3333333335, ans=0.025 2023-11-26 18:26:20,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3514973.3333333335, ans=0.2 2023-11-26 18:26:21,417 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:26:22,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-26 18:26:40,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-26 18:26:41,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3515106.6666666665, ans=0.1 2023-11-26 18:26:44,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3515106.6666666665, ans=0.0 2023-11-26 18:26:51,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-11-26 18:26:53,044 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10250, loss[loss=0.06474, simple_loss=0.08775, pruned_loss=0.012, audio_tagging_loss=0.008864, over 15294.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08879, pruned_loss=0.01214, audio_tagging_loss=0.008815, over 3059960.65 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:57,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3515173.3333333335, ans=0.0 2023-11-26 18:27:12,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.92 vs. limit=10.0 2023-11-26 18:27:18,162 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-26 18:27:21,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-11-26 18:27:23,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3515306.6666666665, ans=0.125 2023-11-26 18:27:24,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3515306.6666666665, ans=0.2 2023-11-26 18:27:25,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3515373.3333333335, ans=0.2 2023-11-26 18:27:31,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3515373.3333333335, ans=0.0 2023-11-26 18:27:48,567 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10300, loss[loss=0.05828, simple_loss=0.07964, pruned_loss=0.01153, audio_tagging_loss=0.006932, over 15392.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08995, pruned_loss=0.01235, audio_tagging_loss=0.008764, over 3067022.43 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:27:53,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3515506.6666666665, ans=0.0 2023-11-26 18:27:54,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-26 18:27:57,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-26 18:28:03,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=22.5 2023-11-26 18:28:05,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.902e+01 9.462e+01 1.025e+02 1.207e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 18:28:13,127 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-26 18:28:16,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3515640.0, ans=0.125 2023-11-26 18:28:23,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3515706.6666666665, ans=0.125 2023-11-26 18:28:45,131 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10350, loss[loss=0.06835, simple_loss=0.08655, pruned_loss=0.01306, audio_tagging_loss=0.01201, over 15784.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09016, pruned_loss=0.01228, audio_tagging_loss=0.008835, over 3059040.70 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:28:46,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3515840.0, ans=0.2 2023-11-26 18:28:52,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3515840.0, ans=0.125 2023-11-26 18:29:03,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3515906.6666666665, ans=0.0 2023-11-26 18:29:08,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-26 18:29:40,532 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10400, loss[loss=0.06304, simple_loss=0.0866, pruned_loss=0.01085, audio_tagging_loss=0.008894, over 15574.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08952, pruned_loss=0.01216, audio_tagging_loss=0.008921, over 3051360.58 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:29:57,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.067e+01 9.465e+01 1.042e+02 1.301e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 18:29:58,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3516240.0, ans=0.2 2023-11-26 18:30:02,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2023-11-26 18:30:05,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-26 18:30:24,382 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:30:35,794 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10450, loss[loss=0.06557, simple_loss=0.09015, pruned_loss=0.0123, audio_tagging_loss=0.008196, over 15843.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09036, pruned_loss=0.01219, audio_tagging_loss=0.008825, over 3061410.70 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:30:57,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3516573.3333333335, ans=0.125 2023-11-26 18:31:01,430 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-26 18:31:01,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-11-26 18:31:05,814 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:31:10,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-26 18:31:11,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3516706.6666666665, ans=0.1 2023-11-26 18:31:16,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2023-11-26 18:31:17,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-26 18:31:27,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3516773.3333333335, ans=0.2 2023-11-26 18:31:30,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3516773.3333333335, ans=0.2 2023-11-26 18:31:33,197 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10500, loss[loss=0.06211, simple_loss=0.0803, pruned_loss=0.01215, audio_tagging_loss=0.00981, over 14753.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09004, pruned_loss=0.0122, audio_tagging_loss=0.008783, over 3059151.57 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:31:36,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3516840.0, ans=0.0 2023-11-26 18:31:45,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2023-11-26 18:31:48,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.975e+01 9.604e+01 1.035e+02 1.568e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 18:31:56,519 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-26 18:32:27,957 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10550, loss[loss=0.06252, simple_loss=0.08679, pruned_loss=0.01101, audio_tagging_loss=0.008119, over 14789.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08971, pruned_loss=0.01219, audio_tagging_loss=0.008701, over 3047776.62 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:32:28,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3517173.3333333335, ans=0.125 2023-11-26 18:32:49,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3517306.6666666665, ans=0.0 2023-11-26 18:32:52,383 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-26 18:32:57,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3517306.6666666665, ans=0.125 2023-11-26 18:33:01,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-26 18:33:07,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-26 18:33:08,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-26 18:33:14,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3517440.0, ans=0.0 2023-11-26 18:33:16,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3517440.0, ans=0.0 2023-11-26 18:33:17,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3517440.0, ans=0.125 2023-11-26 18:33:23,268 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10600, loss[loss=0.07208, simple_loss=0.1018, pruned_loss=0.014, audio_tagging_loss=0.007194, over 15136.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08966, pruned_loss=0.01227, audio_tagging_loss=0.008577, over 3042751.28 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:33:25,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3517506.6666666665, ans=0.125 2023-11-26 18:33:41,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.748e+01 9.260e+01 9.948e+01 1.249e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 18:33:45,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2023-11-26 18:33:49,068 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-26 18:33:55,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=12.0 2023-11-26 18:33:58,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3517706.6666666665, ans=0.05 2023-11-26 18:34:07,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3517773.3333333335, ans=0.2 2023-11-26 18:34:20,603 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10650, loss[loss=0.06965, simple_loss=0.09049, pruned_loss=0.01522, audio_tagging_loss=0.009181, over 14626.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08934, pruned_loss=0.01211, audio_tagging_loss=0.008625, over 3055518.61 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:34:23,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3517840.0, ans=0.0 2023-11-26 18:34:27,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3517840.0, ans=0.125 2023-11-26 18:34:44,373 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-26 18:34:48,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3517973.3333333335, ans=0.015 2023-11-26 18:35:05,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3518106.6666666665, ans=0.0 2023-11-26 18:35:10,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3518106.6666666665, ans=0.125 2023-11-26 18:35:15,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3518173.3333333335, ans=0.0 2023-11-26 18:35:16,068 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10700, loss[loss=0.07964, simple_loss=0.1228, pruned_loss=0.01033, audio_tagging_loss=0.007918, over 15225.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08905, pruned_loss=0.01217, audio_tagging_loss=0.008608, over 3046777.75 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:35:20,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-26 18:35:22,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2023-11-26 18:35:29,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3518240.0, ans=0.125 2023-11-26 18:35:32,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.938e+01 9.482e+01 1.003e+02 1.497e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 18:35:35,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3518240.0, ans=0.1 2023-11-26 18:35:40,329 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-26 18:35:40,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3518306.6666666665, ans=0.125 2023-11-26 18:35:41,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3518306.6666666665, ans=0.0 2023-11-26 18:35:55,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3518373.3333333335, ans=0.125 2023-11-26 18:36:04,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3518440.0, ans=0.125 2023-11-26 18:36:11,566 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10750, loss[loss=0.04178, simple_loss=0.05275, pruned_loss=0.005144, audio_tagging_loss=0.01027, over 14063.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08889, pruned_loss=0.01217, audio_tagging_loss=0.008637, over 3046266.34 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:36:18,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3518506.6666666665, ans=0.2 2023-11-26 18:36:18,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3518506.6666666665, ans=0.2 2023-11-26 18:36:32,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3518573.3333333335, ans=0.1 2023-11-26 18:36:37,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-26 18:36:58,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3518773.3333333335, ans=0.125 2023-11-26 18:37:08,286 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10800, loss[loss=0.05658, simple_loss=0.07244, pruned_loss=0.012, audio_tagging_loss=0.008361, over 14802.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08885, pruned_loss=0.01211, audio_tagging_loss=0.008574, over 3050588.91 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:37:24,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3518906.6666666665, ans=0.125 2023-11-26 18:37:25,435 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.760e+01 9.363e+01 9.951e+01 1.169e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 18:37:33,063 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-26 18:38:03,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-26 18:38:04,874 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10850, loss[loss=0.07981, simple_loss=0.103, pruned_loss=0.01803, audio_tagging_loss=0.0103, over 15959.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08882, pruned_loss=0.01219, audio_tagging_loss=0.008527, over 3049538.23 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:38:10,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3519173.3333333335, ans=0.2 2023-11-26 18:38:13,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.61 vs. limit=22.5 2023-11-26 18:38:28,347 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-26 18:38:37,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3519373.3333333335, ans=0.125 2023-11-26 18:38:41,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3519373.3333333335, ans=0.125 2023-11-26 18:38:49,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3519440.0, ans=0.125 2023-11-26 18:38:50,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3519440.0, ans=0.1 2023-11-26 18:38:58,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-26 18:38:59,175 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:39:00,248 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10900, loss[loss=0.08324, simple_loss=0.116, pruned_loss=0.01759, audio_tagging_loss=0.007653, over 15393.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08765, pruned_loss=0.01202, audio_tagging_loss=0.008675, over 3050007.94 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:39:06,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3519506.6666666665, ans=0.125 2023-11-26 18:39:18,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.701e+01 9.472e+01 1.014e+02 1.998e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 18:39:25,357 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-26 18:39:37,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3519706.6666666665, ans=0.05 2023-11-26 18:39:55,871 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10950, loss[loss=0.08078, simple_loss=0.1154, pruned_loss=0.01552, audio_tagging_loss=0.007557, over 16091.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08829, pruned_loss=0.01199, audio_tagging_loss=0.008589, over 3049644.58 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:40:05,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3519840.0, ans=0.0 2023-11-26 18:40:07,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3519906.6666666665, ans=0.125 2023-11-26 18:40:11,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3519906.6666666665, ans=0.1 2023-11-26 18:40:21,077 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-26 18:40:22,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3519973.3333333335, ans=0.0 2023-11-26 18:40:29,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3519973.3333333335, ans=0.95 2023-11-26 18:40:31,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3520040.0, ans=10.0 2023-11-26 18:40:37,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3520040.0, ans=0.0 2023-11-26 18:40:41,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3520040.0, ans=0.125 2023-11-26 18:40:42,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3520106.6666666665, ans=0.125 2023-11-26 18:40:54,836 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11000, loss[loss=0.08041, simple_loss=0.1028, pruned_loss=0.01953, audio_tagging_loss=0.009504, over 14870.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.0884, pruned_loss=0.01204, audio_tagging_loss=0.008701, over 3047859.86 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:40:55,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=15.0 2023-11-26 18:40:56,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520173.3333333335, ans=0.1 2023-11-26 18:40:58,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3520173.3333333335, ans=0.125 2023-11-26 18:41:02,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-26 18:41:06,042 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:41:12,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.767e+01 9.294e+01 1.014e+02 1.282e+02, threshold=1.859e+02, percent-clipped=1.0 2023-11-26 18:41:17,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3520306.6666666665, ans=0.1 2023-11-26 18:41:18,918 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-26 18:41:20,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=8.0 2023-11-26 18:41:31,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3520373.3333333335, ans=0.0 2023-11-26 18:41:45,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3520440.0, ans=0.2 2023-11-26 18:41:49,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.86 vs. limit=10.0 2023-11-26 18:41:50,612 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11050, loss[loss=0.07038, simple_loss=0.09522, pruned_loss=0.01256, audio_tagging_loss=0.0102, over 16625.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08858, pruned_loss=0.01215, audio_tagging_loss=0.008749, over 3041859.62 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:42:01,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3520573.3333333335, ans=0.125 2023-11-26 18:42:04,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-11-26 18:42:15,738 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-26 18:42:28,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3520706.6666666665, ans=0.0 2023-11-26 18:42:29,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2023-11-26 18:42:37,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3520773.3333333335, ans=0.125 2023-11-26 18:42:46,078 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11100, loss[loss=0.07526, simple_loss=0.1004, pruned_loss=0.01573, audio_tagging_loss=0.009309, over 15968.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08885, pruned_loss=0.01216, audio_tagging_loss=0.008806, over 3050962.88 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:43:04,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.164e+01 1.008e+02 1.089e+02 1.427e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-26 18:43:11,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-26 18:43:13,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3520973.3333333335, ans=0.125 2023-11-26 18:43:29,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-26 18:43:43,192 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11150, loss[loss=0.07755, simple_loss=0.1101, pruned_loss=0.01562, audio_tagging_loss=0.006865, over 16581.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08862, pruned_loss=0.01197, audio_tagging_loss=0.008996, over 3051202.56 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:43:47,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-26 18:44:04,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521306.6666666665, ans=0.1 2023-11-26 18:44:07,145 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-26 18:44:25,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3521373.3333333335, ans=0.0 2023-11-26 18:44:39,049 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11200, loss[loss=0.07977, simple_loss=0.1084, pruned_loss=0.01515, audio_tagging_loss=0.01042, over 15117.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08759, pruned_loss=0.01183, audio_tagging_loss=0.009115, over 3045936.83 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:44:39,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-26 18:44:50,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3521573.3333333335, ans=0.125 2023-11-26 18:44:58,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.872e+01 9.383e+01 1.014e+02 1.200e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 18:45:04,151 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-26 18:45:07,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3521640.0, ans=0.125 2023-11-26 18:45:15,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-11-26 18:45:34,285 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11250, loss[loss=0.05367, simple_loss=0.07275, pruned_loss=0.008744, audio_tagging_loss=0.00855, over 16604.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08792, pruned_loss=0.01187, audio_tagging_loss=0.009047, over 3050814.81 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:45:36,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3521840.0, ans=0.0 2023-11-26 18:45:37,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3521840.0, ans=0.2 2023-11-26 18:45:46,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3521906.6666666665, ans=0.0 2023-11-26 18:45:59,605 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-26 18:46:06,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3521973.3333333335, ans=0.0 2023-11-26 18:46:19,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3522106.6666666665, ans=0.5 2023-11-26 18:46:22,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3522106.6666666665, ans=0.125 2023-11-26 18:46:31,391 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11300, loss[loss=0.05807, simple_loss=0.07509, pruned_loss=0.01155, audio_tagging_loss=0.008975, over 15098.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08883, pruned_loss=0.01215, audio_tagging_loss=0.008767, over 3049869.86 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:46:48,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3522240.0, ans=0.0 2023-11-26 18:46:50,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3522240.0, ans=0.1 2023-11-26 18:46:51,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.924e+01 9.340e+01 1.002e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:46:55,420 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-26 18:46:58,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3522306.6666666665, ans=0.125 2023-11-26 18:47:00,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3522306.6666666665, ans=0.0 2023-11-26 18:47:17,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3522440.0, ans=0.0 2023-11-26 18:47:26,473 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11350, loss[loss=0.06286, simple_loss=0.08824, pruned_loss=0.01317, audio_tagging_loss=0.00557, over 15243.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08954, pruned_loss=0.01232, audio_tagging_loss=0.008651, over 3058305.79 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:47:29,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3522506.6666666665, ans=0.0 2023-11-26 18:47:32,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3522506.6666666665, ans=0.125 2023-11-26 18:47:36,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3522573.3333333335, ans=0.0 2023-11-26 18:47:43,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-26 18:47:43,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2023-11-26 18:47:51,569 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-26 18:47:53,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3522640.0, ans=0.125 2023-11-26 18:48:00,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3522706.6666666665, ans=0.125 2023-11-26 18:48:04,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3522706.6666666665, ans=0.0 2023-11-26 18:48:08,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3522706.6666666665, ans=0.125 2023-11-26 18:48:22,543 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11400, loss[loss=0.0831, simple_loss=0.1094, pruned_loss=0.01981, audio_tagging_loss=0.008598, over 15200.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08992, pruned_loss=0.01239, audio_tagging_loss=0.008581, over 3057481.51 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:48:25,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3522840.0, ans=0.125 2023-11-26 18:48:31,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2023-11-26 18:48:35,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3522906.6666666665, ans=0.1 2023-11-26 18:48:43,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.001e+01 9.594e+01 1.048e+02 1.378e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 18:48:44,613 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:48:47,603 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-26 18:49:10,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3523106.6666666665, ans=0.0 2023-11-26 18:49:16,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3523106.6666666665, ans=0.125 2023-11-26 18:49:19,514 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11450, loss[loss=0.08153, simple_loss=0.1049, pruned_loss=0.01945, audio_tagging_loss=0.009645, over 15093.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09018, pruned_loss=0.0124, audio_tagging_loss=0.008647, over 3052403.47 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:49:21,925 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:49:40,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3523306.6666666665, ans=0.1 2023-11-26 18:49:42,821 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-26 18:49:49,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3523306.6666666665, ans=0.0 2023-11-26 18:49:58,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3523373.3333333335, ans=0.0 2023-11-26 18:50:11,605 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:50:14,553 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11500, loss[loss=0.06103, simple_loss=0.08031, pruned_loss=0.01273, audio_tagging_loss=0.008149, over 15490.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08961, pruned_loss=0.01246, audio_tagging_loss=0.008595, over 3049210.78 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:50:15,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3523506.6666666665, ans=0.0 2023-11-26 18:50:21,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3523506.6666666665, ans=0.2 2023-11-26 18:50:26,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3523573.3333333335, ans=0.125 2023-11-26 18:50:34,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.724e+01 9.278e+01 1.017e+02 1.417e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 18:50:39,561 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-26 18:50:51,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.10 vs. limit=15.0 2023-11-26 18:50:59,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3523773.3333333335, ans=0.125 2023-11-26 18:51:02,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-26 18:51:04,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3523773.3333333335, ans=0.04949747468305833 2023-11-26 18:51:09,835 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11550, loss[loss=0.0777, simple_loss=0.1131, pruned_loss=0.01339, audio_tagging_loss=0.007772, over 15764.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0902, pruned_loss=0.01251, audio_tagging_loss=0.008527, over 3042295.34 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:51:21,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-26 18:51:27,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3523906.6666666665, ans=0.0 2023-11-26 18:51:30,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3523906.6666666665, ans=0.0 2023-11-26 18:51:32,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3523973.3333333335, ans=0.1 2023-11-26 18:51:35,405 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-26 18:51:47,430 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:51:58,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3524106.6666666665, ans=0.125 2023-11-26 18:52:07,222 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11600, loss[loss=0.09401, simple_loss=0.1347, pruned_loss=0.01935, audio_tagging_loss=0.00729, over 16711.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09128, pruned_loss=0.0127, audio_tagging_loss=0.008477, over 3041185.84 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:52:11,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-11-26 18:52:17,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524240.0, ans=0.1 2023-11-26 18:52:18,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3524240.0, ans=0.0 2023-11-26 18:52:26,811 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.975e+01 9.642e+01 1.029e+02 1.280e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 18:52:31,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-26 18:52:42,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3524373.3333333335, ans=0.125 2023-11-26 18:53:02,904 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11650, loss[loss=0.06547, simple_loss=0.09574, pruned_loss=0.009714, audio_tagging_loss=0.007884, over 16261.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09159, pruned_loss=0.0128, audio_tagging_loss=0.00842, over 3039310.97 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:53:19,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3524573.3333333335, ans=0.125 2023-11-26 18:53:27,501 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-26 18:53:42,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3524706.6666666665, ans=0.125 2023-11-26 18:53:56,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3524773.3333333335, ans=0.125 2023-11-26 18:53:57,960 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11700, loss[loss=0.06066, simple_loss=0.07799, pruned_loss=0.01165, audio_tagging_loss=0.01002, over 15850.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09063, pruned_loss=0.01251, audio_tagging_loss=0.00845, over 3043867.23 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:54:04,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3524840.0, ans=0.0 2023-11-26 18:54:19,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.034e+01 9.030e+01 9.498e+01 1.025e+02 1.677e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:54:23,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-26 18:54:24,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3524973.3333333335, ans=0.125 2023-11-26 18:54:30,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3524973.3333333335, ans=0.125 2023-11-26 18:54:42,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-26 18:54:55,275 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11750, loss[loss=0.05044, simple_loss=0.07051, pruned_loss=0.008097, audio_tagging_loss=0.007094, over 15521.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09038, pruned_loss=0.01259, audio_tagging_loss=0.008507, over 3053077.06 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:55:01,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3525173.3333333335, ans=0.125 2023-11-26 18:55:03,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-26 18:55:09,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2023-11-26 18:55:11,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3525240.0, ans=0.125 2023-11-26 18:55:19,249 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-26 18:55:21,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3525306.6666666665, ans=0.125 2023-11-26 18:55:22,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3525306.6666666665, ans=0.2 2023-11-26 18:55:27,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3525373.3333333335, ans=0.125 2023-11-26 18:55:27,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3525373.3333333335, ans=0.125 2023-11-26 18:55:36,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3525373.3333333335, ans=0.0 2023-11-26 18:55:46,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.92 vs. limit=6.0 2023-11-26 18:55:51,222 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11800, loss[loss=0.05071, simple_loss=0.06233, pruned_loss=0.008145, audio_tagging_loss=0.0114, over 13432.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09018, pruned_loss=0.01278, audio_tagging_loss=0.008629, over 3046158.96 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:55:57,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-11-26 18:56:04,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3525573.3333333335, ans=0.2 2023-11-26 18:56:10,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.967e+01 9.583e+01 1.033e+02 1.275e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 18:56:14,766 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-26 18:56:46,548 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11850, loss[loss=0.06089, simple_loss=0.08821, pruned_loss=0.01008, audio_tagging_loss=0.00671, over 16025.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.0912, pruned_loss=0.01274, audio_tagging_loss=0.008672, over 3044647.74 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:56:49,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3525840.0, ans=0.125 2023-11-26 18:57:05,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2023-11-26 18:57:12,181 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-26 18:57:17,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3525973.3333333335, ans=0.0 2023-11-26 18:57:23,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2023-11-26 18:57:36,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3526106.6666666665, ans=0.0 2023-11-26 18:57:42,552 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11900, loss[loss=0.05723, simple_loss=0.08008, pruned_loss=0.01035, audio_tagging_loss=0.006838, over 15448.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09066, pruned_loss=0.01255, audio_tagging_loss=0.008775, over 3047512.20 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:57:56,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-26 18:58:02,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.907e+01 9.565e+01 1.014e+02 1.302e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 18:58:07,181 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-26 18:58:09,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.37 vs. limit=6.0 2023-11-26 18:58:32,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3526440.0, ans=15.0 2023-11-26 18:58:39,286 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11950, loss[loss=0.05884, simple_loss=0.08271, pruned_loss=0.01022, audio_tagging_loss=0.007266, over 15959.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09007, pruned_loss=0.01244, audio_tagging_loss=0.008817, over 3053069.74 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:58:57,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3526573.3333333335, ans=0.0 2023-11-26 18:59:02,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-26 18:59:12,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3526706.6666666665, ans=0.07 2023-11-26 18:59:22,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3526706.6666666665, ans=0.125 2023-11-26 18:59:24,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3526773.3333333335, ans=0.0 2023-11-26 18:59:27,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3526773.3333333335, ans=10.0 2023-11-26 18:59:30,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-26 18:59:32,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3526773.3333333335, ans=0.1 2023-11-26 18:59:34,101 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 12000, loss[loss=0.06271, simple_loss=0.08336, pruned_loss=0.014, audio_tagging_loss=0.007033, over 15520.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09049, pruned_loss=0.01256, audio_tagging_loss=0.008876, over 3054053.49 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:59:34,101 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 19:00:00,177 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3029, 4.2970, 4.4905, 4.4561], device='cuda:1') 2023-11-26 19:00:06,917 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05801, simple_loss=0.05056, pruned_loss=0.005309, audio_tagging_loss=0.02742, over 4681554.00 frames. 2023-11-26 19:00:06,917 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 19:00:21,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3526906.6666666665, ans=0.0 2023-11-26 19:00:25,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.909e+01 9.466e+01 1.042e+02 1.234e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 19:00:29,525 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-26 19:01:05,865 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 0, loss[loss=0.07286, simple_loss=0.08964, pruned_loss=0.009317, audio_tagging_loss=0.01872, over 15263.00 frames. ], tot_loss[loss=0.07286, simple_loss=0.08964, pruned_loss=0.009317, audio_tagging_loss=0.01872, over 15263.00 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:01:05,866 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 19:01:31,702 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8002, 5.8289, 5.8878, 5.8344], device='cuda:1') 2023-11-26 19:01:34,823 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9321, 1.6469, 3.4518, 2.9480, 2.8852, 3.1103, 3.0243, 3.1560], device='cuda:1') 2023-11-26 19:01:37,705 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05755, simple_loss=0.05055, pruned_loss=0.005302, audio_tagging_loss=0.02697, over 4681554.00 frames. 2023-11-26 19:01:37,706 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 19:01:54,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3527080.0, ans=0.015 2023-11-26 19:01:57,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3527080.0, ans=0.0 2023-11-26 19:02:00,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3527146.6666666665, ans=0.0 2023-11-26 19:02:09,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3527146.6666666665, ans=0.0 2023-11-26 19:02:24,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3527280.0, ans=0.125 2023-11-26 19:02:28,768 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-26 19:02:32,911 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 50, loss[loss=0.06629, simple_loss=0.07864, pruned_loss=0.009498, audio_tagging_loss=0.01747, over 15068.00 frames. ], tot_loss[loss=0.07387, simple_loss=0.08951, pruned_loss=0.01191, audio_tagging_loss=0.0172, over 694921.55 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:02:44,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3527413.3333333335, ans=0.2 2023-11-26 19:02:51,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3527413.3333333335, ans=0.2 2023-11-26 19:02:59,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3527480.0, ans=0.1 2023-11-26 19:03:14,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3527546.6666666665, ans=0.125 2023-11-26 19:03:20,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.836e+01 9.859e+01 1.043e+02 1.139e+02 1.375e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-26 19:03:23,768 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-26 19:03:28,439 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 100, loss[loss=0.06595, simple_loss=0.09151, pruned_loss=0.01047, audio_tagging_loss=0.009727, over 15785.00 frames. ], tot_loss[loss=0.07149, simple_loss=0.08738, pruned_loss=0.01158, audio_tagging_loss=0.01622, over 1213883.33 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:03:29,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3527680.0, ans=0.09899494936611666 2023-11-26 19:03:38,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3527746.6666666665, ans=0.07 2023-11-26 19:03:47,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:47,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:48,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3527746.6666666665, ans=0.1 2023-11-26 19:03:59,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3527813.3333333335, ans=0.0 2023-11-26 19:04:12,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3527946.6666666665, ans=0.1 2023-11-26 19:04:19,246 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-26 19:04:23,708 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 150, loss[loss=0.0643, simple_loss=0.08258, pruned_loss=0.01193, audio_tagging_loss=0.01107, over 15271.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.08651, pruned_loss=0.01143, audio_tagging_loss=0.0146, over 1620297.08 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:04:26,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3528013.3333333335, ans=0.5 2023-11-26 19:04:44,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-26 19:04:47,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-26 19:04:55,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-11-26 19:04:59,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3528213.3333333335, ans=0.125 2023-11-26 19:05:09,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3528280.0, ans=0.1 2023-11-26 19:05:11,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.212e+01 9.845e+01 1.053e+02 1.367e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-26 19:05:14,987 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-26 19:05:19,230 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 200, loss[loss=0.06618, simple_loss=0.09151, pruned_loss=0.009527, audio_tagging_loss=0.01089, over 14658.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.08724, pruned_loss=0.01158, audio_tagging_loss=0.01297, over 1925992.55 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:05:30,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-26 19:05:30,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-26 19:05:35,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3528413.3333333335, ans=15.0 2023-11-26 19:05:40,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3528480.0, ans=0.125 2023-11-26 19:06:06,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3528613.3333333335, ans=0.0 2023-11-26 19:06:07,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3528613.3333333335, ans=0.0 2023-11-26 19:06:09,740 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-26 19:06:13,968 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 250, loss[loss=0.07586, simple_loss=0.1024, pruned_loss=0.0154, audio_tagging_loss=0.009286, over 15064.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.08866, pruned_loss=0.0119, audio_tagging_loss=0.01163, over 2178119.42 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:06:19,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3528680.0, ans=0.0 2023-11-26 19:06:47,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3528880.0, ans=0.0 2023-11-26 19:06:52,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-11-26 19:07:01,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3528946.6666666665, ans=0.1 2023-11-26 19:07:01,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 9.038e+01 9.703e+01 1.049e+02 1.454e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 19:07:05,770 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-26 19:07:09,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3529013.3333333335, ans=0.125 2023-11-26 19:07:09,938 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 300, loss[loss=0.08423, simple_loss=0.1172, pruned_loss=0.01864, audio_tagging_loss=0.007001, over 15727.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09021, pruned_loss=0.01205, audio_tagging_loss=0.01072, over 2380388.26 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:07:20,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529080.0, ans=0.1 2023-11-26 19:07:29,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-11-26 19:07:32,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2023-11-26 19:07:57,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3529280.0, ans=0.1 2023-11-26 19:08:00,847 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-26 19:08:05,794 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 350, loss[loss=0.08807, simple_loss=0.1181, pruned_loss=0.02115, audio_tagging_loss=0.007841, over 15909.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.08989, pruned_loss=0.01197, audio_tagging_loss=0.01015, over 2530888.40 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:08:11,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3529346.6666666665, ans=0.125 2023-11-26 19:08:18,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-26 19:08:25,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3529413.3333333335, ans=0.0 2023-11-26 19:08:26,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3529480.0, ans=0.0 2023-11-26 19:08:31,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3529480.0, ans=0.09899494936611666 2023-11-26 19:08:38,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2023-11-26 19:08:42,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2023-11-26 19:08:45,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2023-11-26 19:08:51,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3529613.3333333335, ans=0.125 2023-11-26 19:08:53,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.894e+01 9.566e+01 1.035e+02 1.216e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 19:08:56,495 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-26 19:09:00,676 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 400, loss[loss=0.07149, simple_loss=0.09428, pruned_loss=0.01552, audio_tagging_loss=0.008825, over 15907.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09052, pruned_loss=0.01211, audio_tagging_loss=0.00972, over 2648252.42 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:09:21,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3529746.6666666665, ans=0.125 2023-11-26 19:09:27,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3529813.3333333335, ans=0.0 2023-11-26 19:09:32,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3529813.3333333335, ans=0.2 2023-11-26 19:09:52,822 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-26 19:09:55,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3529946.6666666665, ans=0.125 2023-11-26 19:09:57,566 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 450, loss[loss=0.07154, simple_loss=0.1051, pruned_loss=0.01219, audio_tagging_loss=0.006781, over 15704.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08894, pruned_loss=0.01192, audio_tagging_loss=0.009579, over 2728111.26 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:10:24,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3530146.6666666665, ans=0.125 2023-11-26 19:10:26,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3530146.6666666665, ans=0.125 2023-11-26 19:10:46,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-26 19:10:46,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.546e+01 9.095e+01 1.009e+02 1.358e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-26 19:10:48,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-26 19:10:53,029 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 500, loss[loss=0.06662, simple_loss=0.09229, pruned_loss=0.01241, audio_tagging_loss=0.008061, over 15549.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08904, pruned_loss=0.01198, audio_tagging_loss=0.009351, over 2802927.31 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:11:13,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3530413.3333333335, ans=0.2 2023-11-26 19:11:23,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-26 19:11:28,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3530546.6666666665, ans=0.125 2023-11-26 19:11:35,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3530546.6666666665, ans=0.125 2023-11-26 19:11:44,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-26 19:11:47,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3530613.3333333335, ans=0.1 2023-11-26 19:11:47,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-11-26 19:11:49,390 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 550, loss[loss=0.05129, simple_loss=0.06681, pruned_loss=0.006901, audio_tagging_loss=0.01099, over 17084.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08888, pruned_loss=0.012, audio_tagging_loss=0.009236, over 2856130.78 frames. ], batch size: 65, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:11:59,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3530746.6666666665, ans=0.125 2023-11-26 19:12:06,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3530746.6666666665, ans=0.125 2023-11-26 19:12:11,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3530813.3333333335, ans=0.125 2023-11-26 19:12:12,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3530813.3333333335, ans=0.95 2023-11-26 19:12:24,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3530880.0, ans=0.125 2023-11-26 19:12:29,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-26 19:12:31,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3530880.0, ans=0.125 2023-11-26 19:12:39,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 8.958e+01 9.518e+01 1.034e+02 1.414e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 19:12:41,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-26 19:12:46,988 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 600, loss[loss=0.05993, simple_loss=0.07027, pruned_loss=0.0126, audio_tagging_loss=0.01219, over 15144.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08984, pruned_loss=0.01215, audio_tagging_loss=0.009163, over 2903943.35 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:12:52,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3531013.3333333335, ans=0.0 2023-11-26 19:13:03,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3531080.0, ans=0.125 2023-11-26 19:13:08,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-26 19:13:11,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-26 19:13:17,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3531146.6666666665, ans=0.0 2023-11-26 19:13:33,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2023-11-26 19:13:37,427 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-26 19:13:41,618 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 650, loss[loss=0.04739, simple_loss=0.06103, pruned_loss=0.008527, audio_tagging_loss=0.008348, over 14571.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09071, pruned_loss=0.01243, audio_tagging_loss=0.009004, over 2932466.60 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:13:44,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3531346.6666666665, ans=0.0 2023-11-26 19:13:48,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.25 vs. limit=22.5 2023-11-26 19:13:49,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-26 19:13:51,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3531413.3333333335, ans=0.0 2023-11-26 19:14:28,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3531613.3333333335, ans=0.125 2023-11-26 19:14:29,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 9.030e+01 9.559e+01 1.040e+02 1.405e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 19:14:32,216 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-26 19:14:34,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3531613.3333333335, ans=0.2 2023-11-26 19:14:36,341 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 700, loss[loss=0.0505, simple_loss=0.0773, pruned_loss=0.005498, audio_tagging_loss=0.006355, over 15060.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08996, pruned_loss=0.01212, audio_tagging_loss=0.009062, over 2962383.29 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:14:39,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3531680.0, ans=0.125 2023-11-26 19:14:42,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-26 19:14:43,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-26 19:14:56,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3531746.6666666665, ans=0.0 2023-11-26 19:15:10,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-26 19:15:13,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531880.0, ans=0.1 2023-11-26 19:15:27,938 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-26 19:15:32,433 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 750, loss[loss=0.06657, simple_loss=0.09259, pruned_loss=0.01274, audio_tagging_loss=0.007529, over 15959.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08988, pruned_loss=0.0119, audio_tagging_loss=0.009063, over 2991384.26 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:15:47,047 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:15:53,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3532146.6666666665, ans=0.0 2023-11-26 19:16:08,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3532213.3333333335, ans=22.5 2023-11-26 19:16:09,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.81 vs. limit=22.5 2023-11-26 19:16:10,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-26 19:16:14,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2023-11-26 19:16:22,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.840e+01 9.480e+01 1.009e+02 1.765e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:16:22,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3532280.0, ans=0.125 2023-11-26 19:16:23,769 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-26 19:16:27,894 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 800, loss[loss=0.07433, simple_loss=0.1055, pruned_loss=0.01399, audio_tagging_loss=0.007605, over 15149.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08976, pruned_loss=0.01195, audio_tagging_loss=0.009129, over 3010056.07 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:16:30,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3532346.6666666665, ans=0.0 2023-11-26 19:16:38,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3532413.3333333335, ans=0.0 2023-11-26 19:16:44,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2023-11-26 19:16:50,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-26 19:17:01,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3532546.6666666665, ans=0.2 2023-11-26 19:17:16,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-26 19:17:16,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-11-26 19:17:18,526 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-26 19:17:20,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532613.3333333335, ans=0.1 2023-11-26 19:17:22,615 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 850, loss[loss=0.0804, simple_loss=0.1094, pruned_loss=0.01513, audio_tagging_loss=0.01058, over 15811.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09014, pruned_loss=0.01218, audio_tagging_loss=0.009253, over 3015709.71 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:17:30,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3532680.0, ans=0.125 2023-11-26 19:17:33,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532746.6666666665, ans=0.1 2023-11-26 19:17:39,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3532746.6666666665, ans=0.0 2023-11-26 19:17:39,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3532746.6666666665, ans=0.2 2023-11-26 19:17:43,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3532746.6666666665, ans=0.2 2023-11-26 19:18:11,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.004e+01 9.716e+01 1.045e+02 1.656e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 19:18:13,157 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-26 19:18:18,475 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 900, loss[loss=0.04291, simple_loss=0.0558, pruned_loss=0.006562, audio_tagging_loss=0.008447, over 14380.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09016, pruned_loss=0.0122, audio_tagging_loss=0.009267, over 3023343.73 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:19:06,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3533280.0, ans=0.125 2023-11-26 19:19:08,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3533280.0, ans=0.125 2023-11-26 19:19:09,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-26 19:19:14,251 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 950, loss[loss=0.08059, simple_loss=0.1068, pruned_loss=0.01676, audio_tagging_loss=0.01043, over 14323.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09023, pruned_loss=0.01225, audio_tagging_loss=0.009101, over 3029021.46 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:19:21,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-26 19:19:26,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-26 19:19:28,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3533413.3333333335, ans=0.07 2023-11-26 19:19:28,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-26 19:19:42,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3533480.0, ans=0.1 2023-11-26 19:19:48,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3533546.6666666665, ans=0.125 2023-11-26 19:19:48,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3533546.6666666665, ans=0.1 2023-11-26 19:20:03,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.669e+01 9.436e+01 1.000e+02 1.329e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 19:20:04,919 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-26 19:20:09,175 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1000, loss[loss=0.05369, simple_loss=0.07034, pruned_loss=0.009946, audio_tagging_loss=0.008575, over 14616.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09039, pruned_loss=0.01227, audio_tagging_loss=0.008829, over 3030704.19 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:20:16,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-26 19:20:29,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3533746.6666666665, ans=0.0 2023-11-26 19:20:33,243 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:20:34,629 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:20:40,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3533813.3333333335, ans=0.2 2023-11-26 19:21:00,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-26 19:21:03,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3534013.3333333335, ans=0.0 2023-11-26 19:21:04,783 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1050, loss[loss=0.08416, simple_loss=0.123, pruned_loss=0.01592, audio_tagging_loss=0.00676, over 16188.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09037, pruned_loss=0.01234, audio_tagging_loss=0.008629, over 3027490.25 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:21:15,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-26 19:21:54,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.897e+01 9.575e+01 1.026e+02 1.368e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:21:55,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-26 19:21:55,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3534280.0, ans=0.2 2023-11-26 19:21:59,997 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1100, loss[loss=0.05381, simple_loss=0.07432, pruned_loss=0.00515, audio_tagging_loss=0.0115, over 14381.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08908, pruned_loss=0.01217, audio_tagging_loss=0.008688, over 3027223.39 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:22:02,111 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:22:02,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-26 19:22:04,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3534346.6666666665, ans=0.2 2023-11-26 19:22:04,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534346.6666666665, ans=0.1 2023-11-26 19:22:08,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-26 19:22:11,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3534413.3333333335, ans=0.07 2023-11-26 19:22:18,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3534413.3333333335, ans=0.125 2023-11-26 19:22:35,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-26 19:22:36,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3534546.6666666665, ans=0.1 2023-11-26 19:22:50,894 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-26 19:22:55,386 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1150, loss[loss=0.05973, simple_loss=0.08423, pruned_loss=0.0105, audio_tagging_loss=0.007109, over 15798.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0895, pruned_loss=0.01239, audio_tagging_loss=0.008714, over 3032941.03 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:23:02,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3534680.0, ans=0.125 2023-11-26 19:23:06,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2023-11-26 19:23:06,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2023-11-26 19:23:07,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3534746.6666666665, ans=0.2 2023-11-26 19:23:09,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-26 19:23:13,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-26 19:23:44,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.885e+01 9.403e+01 1.020e+02 1.405e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 19:23:46,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-26 19:23:50,187 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1200, loss[loss=0.04941, simple_loss=0.06441, pruned_loss=0.007388, audio_tagging_loss=0.009819, over 13925.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08832, pruned_loss=0.01219, audio_tagging_loss=0.008661, over 3032976.96 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:24:07,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3535080.0, ans=0.125 2023-11-26 19:24:16,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=15.0 2023-11-26 19:24:17,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3535146.6666666665, ans=0.125 2023-11-26 19:24:43,511 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-26 19:24:47,701 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1250, loss[loss=0.06088, simple_loss=0.08086, pruned_loss=0.009826, audio_tagging_loss=0.01062, over 15276.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08796, pruned_loss=0.01216, audio_tagging_loss=0.008661, over 3032178.60 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:25:00,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-26 19:25:02,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3535413.3333333335, ans=0.125 2023-11-26 19:25:02,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-26 19:25:23,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3535546.6666666665, ans=0.0 2023-11-26 19:25:26,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3535546.6666666665, ans=0.125 2023-11-26 19:25:33,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3535613.3333333335, ans=0.0 2023-11-26 19:25:37,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-26 19:25:38,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.575e+01 1.052e+02 2.949e+02, threshold=1.915e+02, percent-clipped=1.0 2023-11-26 19:25:38,922 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-26 19:25:43,051 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1300, loss[loss=0.08193, simple_loss=0.1121, pruned_loss=0.01647, audio_tagging_loss=0.009421, over 15603.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08818, pruned_loss=0.01213, audio_tagging_loss=0.008655, over 3038595.16 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:26:02,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3535746.6666666665, ans=0.0 2023-11-26 19:26:03,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-11-26 19:26:05,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=22.5 2023-11-26 19:26:30,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3535946.6666666665, ans=0.05 2023-11-26 19:26:33,607 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-26 19:26:38,080 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1350, loss[loss=0.07552, simple_loss=0.1024, pruned_loss=0.01485, audio_tagging_loss=0.009465, over 14982.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08788, pruned_loss=0.01211, audio_tagging_loss=0.008663, over 3042415.05 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:26:42,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-26 19:26:55,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3536080.0, ans=0.04949747468305833 2023-11-26 19:26:57,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3536080.0, ans=0.2 2023-11-26 19:27:18,133 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:27:30,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.761e+01 9.344e+01 1.009e+02 1.308e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 19:27:30,607 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-26 19:27:33,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3536280.0, ans=0.1 2023-11-26 19:27:34,950 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1400, loss[loss=0.06655, simple_loss=0.1013, pruned_loss=0.009726, audio_tagging_loss=0.006187, over 15966.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.0877, pruned_loss=0.01203, audio_tagging_loss=0.008702, over 3043006.28 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:28:07,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-26 19:28:09,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=22.5 2023-11-26 19:28:14,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3536546.6666666665, ans=0.0 2023-11-26 19:28:23,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3536613.3333333335, ans=0.2 2023-11-26 19:28:26,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-26 19:28:28,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3536613.3333333335, ans=0.125 2023-11-26 19:28:30,900 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1450, loss[loss=0.0488, simple_loss=0.06417, pruned_loss=0.00703, audio_tagging_loss=0.009691, over 15727.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08845, pruned_loss=0.01217, audio_tagging_loss=0.008696, over 3047154.64 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:28:36,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3536680.0, ans=0.0 2023-11-26 19:28:37,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3536680.0, ans=0.1 2023-11-26 19:28:42,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2023-11-26 19:28:50,760 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:28:53,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-26 19:28:57,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-26 19:29:02,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2023-11-26 19:29:22,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.955e+01 9.646e+01 1.028e+02 1.353e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 19:29:22,245 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-26 19:29:26,584 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1500, loss[loss=0.06061, simple_loss=0.08494, pruned_loss=0.01039, audio_tagging_loss=0.007748, over 15550.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08788, pruned_loss=0.01214, audio_tagging_loss=0.008898, over 3045221.76 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:29:44,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3537080.0, ans=0.2 2023-11-26 19:29:49,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3537146.6666666665, ans=0.125 2023-11-26 19:29:55,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3537146.6666666665, ans=0.125 2023-11-26 19:30:04,225 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:30:05,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3537213.3333333335, ans=0.125 2023-11-26 19:30:18,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-26 19:30:20,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3537280.0, ans=0.125 2023-11-26 19:30:23,556 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1550, loss[loss=0.08509, simple_loss=0.1133, pruned_loss=0.02128, audio_tagging_loss=0.007179, over 14845.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08815, pruned_loss=0.01226, audio_tagging_loss=0.00899, over 3042793.56 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:30:25,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.37 vs. limit=15.0 2023-11-26 19:30:37,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3537413.3333333335, ans=0.035 2023-11-26 19:31:11,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3537613.3333333335, ans=0.125 2023-11-26 19:31:14,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 9.127e+01 9.575e+01 1.042e+02 1.304e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:31:14,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-26 19:31:18,472 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1600, loss[loss=0.04824, simple_loss=0.05818, pruned_loss=0.00876, audio_tagging_loss=0.01039, over 14284.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08944, pruned_loss=0.01255, audio_tagging_loss=0.008986, over 3044337.30 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:31:19,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-26 19:31:26,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3537680.0, ans=0.1 2023-11-26 19:32:10,442 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-26 19:32:11,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3537946.6666666665, ans=0.1 2023-11-26 19:32:14,669 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1650, loss[loss=0.07246, simple_loss=0.1025, pruned_loss=0.01408, audio_tagging_loss=0.007151, over 14409.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08819, pruned_loss=0.01231, audio_tagging_loss=0.009074, over 3034345.76 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:32:33,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3538080.0, ans=0.125 2023-11-26 19:32:53,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3538213.3333333335, ans=0.125 2023-11-26 19:33:00,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2023-11-26 19:33:05,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2023-11-26 19:33:06,455 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-26 19:33:08,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.661e+01 9.270e+01 1.008e+02 1.328e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 19:33:11,689 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1700, loss[loss=0.06981, simple_loss=0.09395, pruned_loss=0.01505, audio_tagging_loss=0.007788, over 16046.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08875, pruned_loss=0.01226, audio_tagging_loss=0.009116, over 3039911.26 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:33:29,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3538413.3333333335, ans=0.125 2023-11-26 19:34:03,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-26 19:34:07,603 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1750, loss[loss=0.05688, simple_loss=0.07926, pruned_loss=0.009546, audio_tagging_loss=0.007705, over 14942.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08814, pruned_loss=0.01211, audio_tagging_loss=0.009043, over 3037981.56 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:34:19,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3538746.6666666665, ans=0.0 2023-11-26 19:34:31,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3538813.3333333335, ans=0.0 2023-11-26 19:34:31,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3538813.3333333335, ans=0.1 2023-11-26 19:34:38,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3538813.3333333335, ans=0.0 2023-11-26 19:34:41,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3538880.0, ans=0.125 2023-11-26 19:34:49,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3538880.0, ans=0.125 2023-11-26 19:34:49,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-11-26 19:34:58,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-26 19:35:00,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 9.072e+01 9.578e+01 1.039e+02 1.393e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 19:35:03,573 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1800, loss[loss=0.06607, simple_loss=0.0943, pruned_loss=0.01146, audio_tagging_loss=0.007457, over 15735.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08876, pruned_loss=0.01228, audio_tagging_loss=0.008875, over 3037658.56 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:35:10,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3539013.3333333335, ans=0.95 2023-11-26 19:35:33,303 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:35:34,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3539146.6666666665, ans=0.1 2023-11-26 19:35:38,576 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:35:52,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3539280.0, ans=0.1 2023-11-26 19:35:55,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-26 19:35:59,865 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1850, loss[loss=0.05502, simple_loss=0.07315, pruned_loss=0.008413, audio_tagging_loss=0.01004, over 16013.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08975, pruned_loss=0.0123, audio_tagging_loss=0.008674, over 3036680.01 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:36:06,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2023-11-26 19:36:12,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3539413.3333333335, ans=0.0 2023-11-26 19:36:50,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3539613.3333333335, ans=0.125 2023-11-26 19:36:51,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-26 19:36:53,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.894e+01 9.427e+01 1.010e+02 7.555e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-26 19:36:55,633 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1900, loss[loss=0.0687, simple_loss=0.09389, pruned_loss=0.014, audio_tagging_loss=0.007763, over 15490.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08911, pruned_loss=0.01233, audio_tagging_loss=0.008616, over 3034195.59 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:36:55,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3539680.0, ans=0.125 2023-11-26 19:37:21,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3539813.3333333335, ans=0.2 2023-11-26 19:37:27,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3539813.3333333335, ans=0.0 2023-11-26 19:37:37,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3539880.0, ans=0.125 2023-11-26 19:37:46,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-26 19:37:50,783 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1950, loss[loss=0.06285, simple_loss=0.08641, pruned_loss=0.01254, audio_tagging_loss=0.007099, over 15847.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08794, pruned_loss=0.0121, audio_tagging_loss=0.008655, over 3028066.05 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:37:53,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3540013.3333333335, ans=0.0 2023-11-26 19:37:53,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-26 19:38:05,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-26 19:38:06,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3540080.0, ans=0.125 2023-11-26 19:38:36,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=22.5 2023-11-26 19:38:42,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-26 19:38:43,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3540280.0, ans=0.125 2023-11-26 19:38:44,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.819e+01 9.302e+01 9.928e+01 1.179e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 19:38:44,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2023-11-26 19:38:46,888 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2000, loss[loss=0.07957, simple_loss=0.1119, pruned_loss=0.01588, audio_tagging_loss=0.00772, over 15523.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08773, pruned_loss=0.0119, audio_tagging_loss=0.00867, over 3032704.57 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:39:02,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-11-26 19:39:23,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3540546.6666666665, ans=0.0 2023-11-26 19:39:30,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3540613.3333333335, ans=0.125 2023-11-26 19:39:38,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-26 19:39:41,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540680.0, ans=0.1 2023-11-26 19:39:42,511 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2050, loss[loss=0.05066, simple_loss=0.0673, pruned_loss=0.008869, audio_tagging_loss=0.008145, over 15094.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.088, pruned_loss=0.01208, audio_tagging_loss=0.008641, over 3028943.00 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:39:44,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3540680.0, ans=0.09899494936611666 2023-11-26 19:40:15,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-26 19:40:18,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3540880.0, ans=0.2 2023-11-26 19:40:33,204 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-26 19:40:35,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.849e+01 9.497e+01 1.034e+02 1.158e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 19:40:37,438 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2100, loss[loss=0.06354, simple_loss=0.08742, pruned_loss=0.01159, audio_tagging_loss=0.008234, over 16088.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08854, pruned_loss=0.01205, audio_tagging_loss=0.008503, over 3033375.48 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:40:46,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3541013.3333333335, ans=0.125 2023-11-26 19:41:14,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541213.3333333335, ans=0.1 2023-11-26 19:41:15,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-11-26 19:41:28,839 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-26 19:41:33,345 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2150, loss[loss=0.05825, simple_loss=0.07958, pruned_loss=0.01087, audio_tagging_loss=0.0076, over 14977.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08821, pruned_loss=0.01206, audio_tagging_loss=0.008571, over 3035022.11 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:41:52,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-26 19:42:01,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3541480.0, ans=0.05 2023-11-26 19:42:06,941 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:42:11,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3541546.6666666665, ans=0.125 2023-11-26 19:42:13,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3541546.6666666665, ans=0.125 2023-11-26 19:42:26,175 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-26 19:42:28,255 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.897e+01 9.501e+01 1.024e+02 1.712e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 19:42:28,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541613.3333333335, ans=0.1 2023-11-26 19:42:30,381 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2200, loss[loss=0.06496, simple_loss=0.07552, pruned_loss=0.01122, audio_tagging_loss=0.01598, over 14750.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08997, pruned_loss=0.01227, audio_tagging_loss=0.008528, over 3043295.49 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:43:01,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-11-26 19:43:04,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3541880.0, ans=0.125 2023-11-26 19:43:12,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3541880.0, ans=0.0 2023-11-26 19:43:14,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-26 19:43:19,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-26 19:43:21,587 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-26 19:43:25,856 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2250, loss[loss=0.07754, simple_loss=0.09893, pruned_loss=0.01554, audio_tagging_loss=0.01253, over 15469.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09002, pruned_loss=0.0124, audio_tagging_loss=0.008557, over 3040316.22 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:43:32,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3542013.3333333335, ans=0.1 2023-11-26 19:43:41,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3542080.0, ans=0.125 2023-11-26 19:44:10,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3542280.0, ans=0.1 2023-11-26 19:44:10,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2023-11-26 19:44:17,331 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-26 19:44:18,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542280.0, ans=0.1 2023-11-26 19:44:19,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.901e+01 8.880e+01 9.630e+01 1.027e+02 2.263e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-26 19:44:21,511 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2300, loss[loss=0.05956, simple_loss=0.07789, pruned_loss=0.01255, audio_tagging_loss=0.008063, over 15186.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09002, pruned_loss=0.01239, audio_tagging_loss=0.008652, over 3037549.83 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:44:44,246 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:44:45,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3542480.0, ans=0.0 2023-11-26 19:44:46,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3542480.0, ans=0.125 2023-11-26 19:44:46,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-26 19:44:46,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2023-11-26 19:44:54,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542546.6666666665, ans=0.1 2023-11-26 19:45:00,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3542546.6666666665, ans=0.125 2023-11-26 19:45:05,574 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:45:09,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3542613.3333333335, ans=0.0 2023-11-26 19:45:11,705 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:45:14,417 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-26 19:45:15,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3542613.3333333335, ans=0.125 2023-11-26 19:45:18,982 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2350, loss[loss=0.05837, simple_loss=0.07994, pruned_loss=0.01111, audio_tagging_loss=0.007291, over 16149.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08944, pruned_loss=0.01224, audio_tagging_loss=0.008719, over 3040278.71 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:45:22,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3542680.0, ans=0.0 2023-11-26 19:45:34,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3542746.6666666665, ans=0.2 2023-11-26 19:45:41,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3542813.3333333335, ans=0.1 2023-11-26 19:45:42,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3542813.3333333335, ans=0.0 2023-11-26 19:45:45,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2023-11-26 19:45:57,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3542880.0, ans=0.125 2023-11-26 19:46:00,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3542880.0, ans=0.125 2023-11-26 19:46:05,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-26 19:46:08,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-26 19:46:10,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-26 19:46:10,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3542946.6666666665, ans=0.09899494936611666 2023-11-26 19:46:12,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.878e+01 9.481e+01 9.940e+01 1.139e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:46:13,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3542946.6666666665, ans=0.1 2023-11-26 19:46:14,884 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2400, loss[loss=0.08046, simple_loss=0.1194, pruned_loss=0.01368, audio_tagging_loss=0.007084, over 14807.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09057, pruned_loss=0.01241, audio_tagging_loss=0.008734, over 3039054.70 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:46:20,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3543013.3333333335, ans=0.2 2023-11-26 19:46:29,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3543080.0, ans=0.015 2023-11-26 19:46:45,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543146.6666666665, ans=0.1 2023-11-26 19:46:52,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-11-26 19:47:05,858 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-26 19:47:10,008 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2450, loss[loss=0.04552, simple_loss=0.06113, pruned_loss=0.004789, audio_tagging_loss=0.01017, over 15589.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09039, pruned_loss=0.01222, audio_tagging_loss=0.00879, over 3035683.48 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:47:20,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3543413.3333333335, ans=0.07 2023-11-26 19:47:22,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3543413.3333333335, ans=0.5 2023-11-26 19:47:33,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3543480.0, ans=0.125 2023-11-26 19:47:43,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-26 19:48:02,287 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-26 19:48:03,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3543613.3333333335, ans=0.125 2023-11-26 19:48:05,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.383e+01 9.958e+01 1.270e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 19:48:07,073 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2500, loss[loss=0.05806, simple_loss=0.08295, pruned_loss=0.008017, audio_tagging_loss=0.00857, over 15324.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09102, pruned_loss=0.01232, audio_tagging_loss=0.008744, over 3041580.99 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:48:16,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.74 vs. limit=10.0 2023-11-26 19:48:33,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2023-11-26 19:48:35,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2023-11-26 19:48:58,198 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-26 19:49:03,304 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2550, loss[loss=0.06804, simple_loss=0.08353, pruned_loss=0.01453, audio_tagging_loss=0.01175, over 15110.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09086, pruned_loss=0.01235, audio_tagging_loss=0.008718, over 3040711.03 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:49:07,731 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:49:08,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3544013.3333333335, ans=0.0 2023-11-26 19:49:12,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-26 19:49:26,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3544146.6666666665, ans=0.125 2023-11-26 19:49:27,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3544146.6666666665, ans=0.0 2023-11-26 19:49:49,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-26 19:49:53,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3544280.0, ans=0.5 2023-11-26 19:49:54,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-26 19:49:57,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.904e+01 9.624e+01 1.038e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 19:49:58,730 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2600, loss[loss=0.06781, simple_loss=0.09225, pruned_loss=0.01569, audio_tagging_loss=0.005997, over 15337.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08972, pruned_loss=0.01221, audio_tagging_loss=0.008661, over 3040988.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:50:20,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3544413.3333333335, ans=0.1 2023-11-26 19:50:23,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3544480.0, ans=0.0 2023-11-26 19:50:38,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3544546.6666666665, ans=0.125 2023-11-26 19:50:46,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3544613.3333333335, ans=0.0 2023-11-26 19:50:50,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-26 19:50:56,177 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2650, loss[loss=0.06263, simple_loss=0.08987, pruned_loss=0.009946, audio_tagging_loss=0.007743, over 14979.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0907, pruned_loss=0.01233, audio_tagging_loss=0.008526, over 3040595.13 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:51:34,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3544880.0, ans=0.2 2023-11-26 19:51:47,374 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-26 19:51:50,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.824e+01 9.475e+01 1.010e+02 1.281e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 19:51:51,592 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2700, loss[loss=0.05883, simple_loss=0.07945, pruned_loss=0.00817, audio_tagging_loss=0.01094, over 14149.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09039, pruned_loss=0.0123, audio_tagging_loss=0.008536, over 3033973.85 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:51:54,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2023-11-26 19:51:59,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3545013.3333333335, ans=0.0 2023-11-26 19:52:05,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2023-11-26 19:52:17,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3545146.6666666665, ans=0.125 2023-11-26 19:52:22,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3545146.6666666665, ans=0.0 2023-11-26 19:52:24,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-26 19:52:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3545213.3333333335, ans=0.1 2023-11-26 19:52:37,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3545280.0, ans=0.125 2023-11-26 19:52:38,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3545280.0, ans=0.125 2023-11-26 19:52:39,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3545280.0, ans=0.1 2023-11-26 19:52:42,958 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-26 19:52:46,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3545346.6666666665, ans=0.125 2023-11-26 19:52:47,352 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2750, loss[loss=0.07477, simple_loss=0.1011, pruned_loss=0.01503, audio_tagging_loss=0.009209, over 15790.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09054, pruned_loss=0.01233, audio_tagging_loss=0.008581, over 3037299.86 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:52:54,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2023-11-26 19:53:04,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3545413.3333333335, ans=0.0 2023-11-26 19:53:20,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3545546.6666666665, ans=0.0 2023-11-26 19:53:20,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-26 19:53:28,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-26 19:53:34,579 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:53:37,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-26 19:53:42,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 8.910e+01 9.547e+01 1.021e+02 1.473e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 19:53:43,097 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2800, loss[loss=0.07616, simple_loss=0.1116, pruned_loss=0.01408, audio_tagging_loss=0.006279, over 15537.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09024, pruned_loss=0.01229, audio_tagging_loss=0.008564, over 3037249.18 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:53:50,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3545680.0, ans=0.2 2023-11-26 19:54:18,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3545880.0, ans=0.5 2023-11-26 19:54:28,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3545946.6666666665, ans=0.125 2023-11-26 19:54:34,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-26 19:54:36,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3545946.6666666665, ans=0.125 2023-11-26 19:54:37,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2023-11-26 19:54:38,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3546013.3333333335, ans=0.2 2023-11-26 19:54:38,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3546013.3333333335, ans=0.0 2023-11-26 19:54:39,004 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2850, loss[loss=0.08309, simple_loss=0.1064, pruned_loss=0.02003, audio_tagging_loss=0.009863, over 14358.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08893, pruned_loss=0.01208, audio_tagging_loss=0.008569, over 3032938.67 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:55:00,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-26 19:55:04,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3546146.6666666665, ans=0.0 2023-11-26 19:55:04,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-26 19:55:23,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3546280.0, ans=0.0 2023-11-26 19:55:30,543 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-26 19:55:34,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.835e+01 9.315e+01 9.874e+01 1.722e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 19:55:34,732 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2900, loss[loss=0.06227, simple_loss=0.07812, pruned_loss=0.01319, audio_tagging_loss=0.01002, over 14781.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.0891, pruned_loss=0.01205, audio_tagging_loss=0.008665, over 3032239.20 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:55:35,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-26 19:55:38,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3546346.6666666665, ans=0.0 2023-11-26 19:55:46,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3546413.3333333335, ans=0.125 2023-11-26 19:56:12,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3546546.6666666665, ans=0.1 2023-11-26 19:56:26,677 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-26 19:56:26,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3546613.3333333335, ans=0.2 2023-11-26 19:56:28,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2023-11-26 19:56:33,908 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2950, loss[loss=0.06132, simple_loss=0.08444, pruned_loss=0.009815, audio_tagging_loss=0.009285, over 15382.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08997, pruned_loss=0.01224, audio_tagging_loss=0.008637, over 3034920.04 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:57:05,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3546813.3333333335, ans=0.0 2023-11-26 19:57:25,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3546946.6666666665, ans=0.125 2023-11-26 19:57:26,199 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-26 19:57:27,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3546946.6666666665, ans=0.0 2023-11-26 19:57:27,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-11-26 19:57:30,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.991e+01 9.608e+01 1.049e+02 1.344e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 19:57:30,315 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3000, loss[loss=0.06515, simple_loss=0.08833, pruned_loss=0.01125, audio_tagging_loss=0.009732, over 14943.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09003, pruned_loss=0.01232, audio_tagging_loss=0.008675, over 3038535.55 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:57:30,316 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 19:58:03,034 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05745, simple_loss=0.05048, pruned_loss=0.005228, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-26 19:58:03,035 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 19:58:27,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=12.0 2023-11-26 19:58:28,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3547146.6666666665, ans=0.125 2023-11-26 19:58:46,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3547280.0, ans=0.125 2023-11-26 19:58:54,330 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-26 19:59:00,131 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3050, loss[loss=0.04237, simple_loss=0.0486, pruned_loss=0.00782, audio_tagging_loss=0.01025, over 16027.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.0899, pruned_loss=0.01228, audio_tagging_loss=0.008671, over 3041506.36 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:59:00,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3547346.6666666665, ans=0.125 2023-11-26 19:59:31,806 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:59:51,775 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-26 19:59:55,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3547680.0, ans=0.125 2023-11-26 19:59:55,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.969e+01 9.484e+01 1.021e+02 1.234e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 19:59:55,924 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3100, loss[loss=0.05233, simple_loss=0.06734, pruned_loss=0.008476, audio_tagging_loss=0.01018, over 13534.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09024, pruned_loss=0.01224, audio_tagging_loss=0.008742, over 3038377.36 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:00:06,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3547746.6666666665, ans=0.0 2023-11-26 20:00:47,256 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-26 20:00:48,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3547946.6666666665, ans=0.125 2023-11-26 20:00:51,690 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3150, loss[loss=0.05022, simple_loss=0.07255, pruned_loss=0.004822, audio_tagging_loss=0.009121, over 15708.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0899, pruned_loss=0.01204, audio_tagging_loss=0.008774, over 3046008.90 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:00:55,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3548013.3333333335, ans=0.0 2023-11-26 20:01:09,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-11-26 20:01:15,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3548146.6666666665, ans=0.1 2023-11-26 20:01:20,693 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:01:29,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3548213.3333333335, ans=0.125 2023-11-26 20:01:33,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548213.3333333335, ans=0.1 2023-11-26 20:01:41,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3548280.0, ans=0.125 2023-11-26 20:01:43,559 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-26 20:01:47,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3548346.6666666665, ans=0.125 2023-11-26 20:01:47,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-11-26 20:01:48,383 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3200, loss[loss=0.05983, simple_loss=0.07397, pruned_loss=0.01167, audio_tagging_loss=0.01117, over 14768.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08955, pruned_loss=0.012, audio_tagging_loss=0.008921, over 3045587.17 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:01:49,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.654e+01 1.076e+02 1.284e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 20:02:24,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548546.6666666665, ans=0.1 2023-11-26 20:02:32,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-26 20:02:34,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3548613.3333333335, ans=0.0 2023-11-26 20:02:34,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2023-11-26 20:02:40,672 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-26 20:02:44,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3548680.0, ans=0.125 2023-11-26 20:02:44,882 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3250, loss[loss=0.07068, simple_loss=0.09742, pruned_loss=0.01349, audio_tagging_loss=0.008474, over 15176.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08916, pruned_loss=0.01203, audio_tagging_loss=0.00898, over 3052135.90 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:03:21,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3548880.0, ans=0.0 2023-11-26 20:03:28,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-11-26 20:03:28,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2023-11-26 20:03:35,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-26 20:03:37,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3548946.6666666665, ans=0.125 2023-11-26 20:03:40,078 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3300, loss[loss=0.04946, simple_loss=0.06676, pruned_loss=0.006445, audio_tagging_loss=0.009636, over 15265.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09053, pruned_loss=0.01236, audio_tagging_loss=0.008991, over 3047676.91 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:03:41,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.931e+01 9.545e+01 1.032e+02 1.663e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 20:03:56,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3549080.0, ans=0.95 2023-11-26 20:04:01,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3549080.0, ans=0.09899494936611666 2023-11-26 20:04:08,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3549146.6666666665, ans=0.125 2023-11-26 20:04:10,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3549146.6666666665, ans=0.125 2023-11-26 20:04:30,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-26 20:04:35,385 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3350, loss[loss=0.07278, simple_loss=0.1035, pruned_loss=0.01294, audio_tagging_loss=0.008077, over 16560.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09096, pruned_loss=0.01256, audio_tagging_loss=0.008964, over 3051079.11 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:04:40,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-26 20:05:01,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3549480.0, ans=0.125 2023-11-26 20:05:27,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-26 20:05:31,381 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3400, loss[loss=0.06723, simple_loss=0.09607, pruned_loss=0.009451, audio_tagging_loss=0.009747, over 14553.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09115, pruned_loss=0.01255, audio_tagging_loss=0.008831, over 3052175.40 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:05:33,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.780e+01 9.356e+01 1.019e+02 1.296e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 20:05:43,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3549746.6666666665, ans=0.125 2023-11-26 20:05:54,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3549813.3333333335, ans=0.125 2023-11-26 20:06:18,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3549946.6666666665, ans=0.1 2023-11-26 20:06:22,055 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-26 20:06:26,252 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3450, loss[loss=0.05336, simple_loss=0.07559, pruned_loss=0.008159, audio_tagging_loss=0.00741, over 14684.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09038, pruned_loss=0.01224, audio_tagging_loss=0.008691, over 3051203.90 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:06:32,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3550013.3333333335, ans=0.2 2023-11-26 20:06:40,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3550080.0, ans=0.125 2023-11-26 20:06:55,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3550146.6666666665, ans=0.125 2023-11-26 20:07:17,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-26 20:07:21,481 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3500, loss[loss=0.05499, simple_loss=0.07579, pruned_loss=0.008113, audio_tagging_loss=0.008976, over 15226.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09048, pruned_loss=0.0123, audio_tagging_loss=0.008579, over 3050564.05 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:07:23,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.941e+01 9.512e+01 1.027e+02 1.407e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 20:07:45,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3550480.0, ans=0.125 2023-11-26 20:07:50,157 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:08:14,257 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-26 20:08:19,287 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3550, loss[loss=0.06041, simple_loss=0.08321, pruned_loss=0.00999, audio_tagging_loss=0.008808, over 14914.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08963, pruned_loss=0.01227, audio_tagging_loss=0.008577, over 3044218.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:08:24,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3550680.0, ans=0.0 2023-11-26 20:08:35,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3550746.6666666665, ans=0.0 2023-11-26 20:08:40,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3550813.3333333335, ans=0.05 2023-11-26 20:08:41,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3550813.3333333335, ans=0.125 2023-11-26 20:08:53,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3550880.0, ans=0.2 2023-11-26 20:09:10,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-26 20:09:11,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3550946.6666666665, ans=0.0 2023-11-26 20:09:14,580 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3600, loss[loss=0.04573, simple_loss=0.06076, pruned_loss=0.006875, audio_tagging_loss=0.008476, over 15786.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.0885, pruned_loss=0.01208, audio_tagging_loss=0.008659, over 3045895.28 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:09:16,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.819e+01 9.427e+01 1.004e+02 1.284e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 20:09:20,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3551013.3333333335, ans=0.0 2023-11-26 20:09:24,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-11-26 20:09:30,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3551080.0, ans=0.0 2023-11-26 20:09:51,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-26 20:10:05,623 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-26 20:10:09,761 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3650, loss[loss=0.03956, simple_loss=0.03897, pruned_loss=0.007271, audio_tagging_loss=0.0128, over 15386.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08848, pruned_loss=0.01213, audio_tagging_loss=0.008639, over 3039677.91 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:10:24,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3551413.3333333335, ans=0.2 2023-11-26 20:10:27,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3551413.3333333335, ans=0.125 2023-11-26 20:10:49,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3551546.6666666665, ans=0.125 2023-11-26 20:10:50,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3551546.6666666665, ans=0.125 2023-11-26 20:10:57,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3551613.3333333335, ans=0.0 2023-11-26 20:11:00,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3551613.3333333335, ans=0.1 2023-11-26 20:11:02,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-26 20:11:06,844 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3700, loss[loss=0.09529, simple_loss=0.1394, pruned_loss=0.02035, audio_tagging_loss=0.005229, over 15939.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08996, pruned_loss=0.01229, audio_tagging_loss=0.008551, over 3047724.63 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:11:08,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.774e+01 9.496e+01 1.016e+02 1.285e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 20:11:13,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3551680.0, ans=0.0 2023-11-26 20:11:17,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3551746.6666666665, ans=0.0 2023-11-26 20:11:47,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3551880.0, ans=0.125 2023-11-26 20:11:48,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3551880.0, ans=0.09899494936611666 2023-11-26 20:11:58,956 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-26 20:12:03,431 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3750, loss[loss=0.06966, simple_loss=0.09958, pruned_loss=0.01285, audio_tagging_loss=0.007029, over 15702.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09014, pruned_loss=0.01228, audio_tagging_loss=0.008567, over 3054417.41 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:12:06,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3552013.3333333335, ans=0.125 2023-11-26 20:12:27,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3552146.6666666665, ans=0.125 2023-11-26 20:12:41,473 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:12:52,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3552280.0, ans=0.125 2023-11-26 20:12:54,259 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-26 20:12:58,427 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3800, loss[loss=0.06916, simple_loss=0.09847, pruned_loss=0.01256, audio_tagging_loss=0.007365, over 15537.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09025, pruned_loss=0.01224, audio_tagging_loss=0.008597, over 3056426.59 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:13:00,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.984e+01 9.632e+01 1.029e+02 1.593e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 20:13:19,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3552413.3333333335, ans=0.125 2023-11-26 20:13:31,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3552546.6666666665, ans=0.07 2023-11-26 20:13:32,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3552546.6666666665, ans=0.125 2023-11-26 20:13:49,773 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-26 20:13:54,532 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3850, loss[loss=0.07601, simple_loss=0.1024, pruned_loss=0.0161, audio_tagging_loss=0.008682, over 15219.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09102, pruned_loss=0.0125, audio_tagging_loss=0.008589, over 3052515.96 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:14:42,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3552946.6666666665, ans=0.0 2023-11-26 20:14:45,345 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-26 20:14:48,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3553013.3333333335, ans=0.125 2023-11-26 20:14:48,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3553013.3333333335, ans=0.95 2023-11-26 20:14:49,658 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3900, loss[loss=0.06379, simple_loss=0.08858, pruned_loss=0.01355, audio_tagging_loss=0.005952, over 13420.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09012, pruned_loss=0.01231, audio_tagging_loss=0.008698, over 3051399.26 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:14:52,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.931e+01 9.529e+01 1.011e+02 1.303e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:15:10,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=12.0 2023-11-26 20:15:11,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3553146.6666666665, ans=0.0 2023-11-26 20:15:23,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553213.3333333335, ans=0.1 2023-11-26 20:15:26,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3553213.3333333335, ans=0.0 2023-11-26 20:15:28,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3553213.3333333335, ans=22.5 2023-11-26 20:15:40,849 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-26 20:15:45,315 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3950, loss[loss=0.06113, simple_loss=0.08644, pruned_loss=0.009962, audio_tagging_loss=0.00795, over 15572.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09044, pruned_loss=0.0123, audio_tagging_loss=0.008813, over 3051919.73 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:15:45,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-26 20:15:49,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3553346.6666666665, ans=0.025 2023-11-26 20:15:59,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3553413.3333333335, ans=0.0 2023-11-26 20:16:03,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3553413.3333333335, ans=0.125 2023-11-26 20:16:04,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3553413.3333333335, ans=0.125 2023-11-26 20:16:23,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3553546.6666666665, ans=0.0 2023-11-26 20:16:25,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3553546.6666666665, ans=0.04949747468305833 2023-11-26 20:16:29,115 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:16:30,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3553613.3333333335, ans=0.0 2023-11-26 20:16:36,471 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-26 20:16:39,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3553613.3333333335, ans=0.0 2023-11-26 20:16:42,271 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4000, loss[loss=0.05344, simple_loss=0.071, pruned_loss=0.009258, audio_tagging_loss=0.008683, over 13785.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09037, pruned_loss=0.01225, audio_tagging_loss=0.008803, over 3051014.84 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:16:44,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-11-26 20:16:44,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 8.850e+01 9.399e+01 1.031e+02 1.680e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 20:16:53,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3553746.6666666665, ans=0.2 2023-11-26 20:16:55,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3553746.6666666665, ans=0.09899494936611666 2023-11-26 20:16:59,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=12.0 2023-11-26 20:17:00,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-11-26 20:17:02,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3553813.3333333335, ans=0.0 2023-11-26 20:17:06,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-11-26 20:17:20,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3553880.0, ans=0.125 2023-11-26 20:17:33,293 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-26 20:17:37,519 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4050, loss[loss=0.06886, simple_loss=0.09412, pruned_loss=0.009871, audio_tagging_loss=0.01193, over 15534.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09063, pruned_loss=0.01234, audio_tagging_loss=0.008846, over 3048923.46 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:17:38,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554013.3333333335, ans=0.1 2023-11-26 20:17:39,675 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:17:57,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-11-26 20:18:03,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-26 20:18:29,017 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-26 20:18:33,723 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4100, loss[loss=0.06896, simple_loss=0.08589, pruned_loss=0.01823, audio_tagging_loss=0.007781, over 15037.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09022, pruned_loss=0.01216, audio_tagging_loss=0.008867, over 3053639.37 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:18:36,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.812e+01 9.418e+01 1.019e+02 1.290e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:18:42,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3554346.6666666665, ans=0.125 2023-11-26 20:18:54,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3554413.3333333335, ans=0.125 2023-11-26 20:19:10,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554546.6666666665, ans=0.1 2023-11-26 20:19:24,913 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-26 20:19:29,893 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4150, loss[loss=0.0814, simple_loss=0.1135, pruned_loss=0.01667, audio_tagging_loss=0.00796, over 15069.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09037, pruned_loss=0.0122, audio_tagging_loss=0.00867, over 3058924.03 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:19:38,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3554680.0, ans=0.125 2023-11-26 20:19:41,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3554746.6666666665, ans=0.125 2023-11-26 20:19:49,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3554746.6666666665, ans=0.0 2023-11-26 20:20:09,858 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:20:19,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3554946.6666666665, ans=0.0 2023-11-26 20:20:21,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-26 20:20:23,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-26 20:20:26,196 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4200, loss[loss=0.06109, simple_loss=0.09354, pruned_loss=0.007466, audio_tagging_loss=0.006853, over 15689.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09059, pruned_loss=0.01228, audio_tagging_loss=0.008607, over 3068734.94 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:20:27,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3555013.3333333335, ans=0.0 2023-11-26 20:20:29,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.868e+01 9.396e+01 9.993e+01 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 20:20:33,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.83 vs. limit=22.5 2023-11-26 20:20:47,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3555146.6666666665, ans=0.0 2023-11-26 20:20:58,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3555146.6666666665, ans=0.125 2023-11-26 20:21:00,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-26 20:21:01,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3555213.3333333335, ans=0.2 2023-11-26 20:21:03,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-26 20:21:13,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-26 20:21:17,442 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-26 20:21:20,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3555346.6666666665, ans=0.0 2023-11-26 20:21:21,706 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4250, loss[loss=0.08775, simple_loss=0.1222, pruned_loss=0.01989, audio_tagging_loss=0.006747, over 15717.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09096, pruned_loss=0.01244, audio_tagging_loss=0.008551, over 3061075.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:21:23,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3555346.6666666665, ans=0.035 2023-11-26 20:21:26,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3555346.6666666665, ans=0.05 2023-11-26 20:21:40,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3555413.3333333335, ans=0.1 2023-11-26 20:21:46,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3555480.0, ans=0.125 2023-11-26 20:21:48,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3555480.0, ans=0.1 2023-11-26 20:21:55,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2023-11-26 20:21:59,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3555546.6666666665, ans=0.0 2023-11-26 20:22:03,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3555546.6666666665, ans=0.125 2023-11-26 20:22:13,673 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-26 20:22:17,922 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4300, loss[loss=0.06434, simple_loss=0.09471, pruned_loss=0.01023, audio_tagging_loss=0.006749, over 15598.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09021, pruned_loss=0.01225, audio_tagging_loss=0.008565, over 3050759.55 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:22:21,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.104e+01 9.879e+01 1.029e+02 1.419e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 20:22:30,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3555746.6666666665, ans=0.0 2023-11-26 20:22:41,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2023-11-26 20:23:10,870 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-26 20:23:15,334 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4350, loss[loss=0.08199, simple_loss=0.1212, pruned_loss=0.01525, audio_tagging_loss=0.006132, over 14977.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0913, pruned_loss=0.01251, audio_tagging_loss=0.00845, over 3040329.84 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:23:28,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3556080.0, ans=0.95 2023-11-26 20:23:34,741 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:23:37,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-11-26 20:24:06,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-26 20:24:10,364 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4400, loss[loss=0.05941, simple_loss=0.0868, pruned_loss=0.009874, audio_tagging_loss=0.006131, over 15883.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09074, pruned_loss=0.01252, audio_tagging_loss=0.008443, over 3034280.03 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:24:13,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.838e+01 9.451e+01 1.042e+02 1.230e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 20:24:22,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-11-26 20:24:23,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-11-26 20:24:37,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3556480.0, ans=0.0 2023-11-26 20:24:40,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2023-11-26 20:24:49,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3556546.6666666665, ans=0.2 2023-11-26 20:24:54,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-26 20:25:01,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.73 vs. limit=10.0 2023-11-26 20:25:01,848 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-26 20:25:06,071 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4450, loss[loss=0.06927, simple_loss=0.09653, pruned_loss=0.01277, audio_tagging_loss=0.00823, over 16150.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09109, pruned_loss=0.01236, audio_tagging_loss=0.008408, over 3046339.62 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:25:06,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3556680.0, ans=0.1 2023-11-26 20:25:53,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3556946.6666666665, ans=0.1 2023-11-26 20:25:57,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3556946.6666666665, ans=0.125 2023-11-26 20:25:58,328 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-26 20:26:02,377 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4500, loss[loss=0.05532, simple_loss=0.06872, pruned_loss=0.01166, audio_tagging_loss=0.0093, over 16049.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09127, pruned_loss=0.01244, audio_tagging_loss=0.008434, over 3044483.48 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:26:05,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.109e+01 9.005e+01 9.519e+01 1.049e+02 1.463e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:26:53,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-26 20:26:57,883 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4550, loss[loss=0.0544, simple_loss=0.07871, pruned_loss=0.007874, audio_tagging_loss=0.007173, over 13660.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08904, pruned_loss=0.01208, audio_tagging_loss=0.00848, over 3034031.86 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:16,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3557413.3333333335, ans=0.0 2023-11-26 20:27:39,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3557546.6666666665, ans=0.1 2023-11-26 20:27:40,728 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:27:49,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-26 20:27:51,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3557613.3333333335, ans=0.0 2023-11-26 20:27:53,363 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4600, loss[loss=0.07543, simple_loss=0.1051, pruned_loss=0.01722, audio_tagging_loss=0.005659, over 15733.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08903, pruned_loss=0.01223, audio_tagging_loss=0.008613, over 3044256.52 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:56,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.920e+01 9.626e+01 1.020e+02 1.318e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 20:28:02,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3557680.0, ans=0.1 2023-11-26 20:28:06,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:28:21,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3557813.3333333335, ans=10.0 2023-11-26 20:28:25,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3557813.3333333335, ans=0.125 2023-11-26 20:28:29,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-11-26 20:28:44,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-26 20:28:45,464 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-26 20:28:50,180 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4650, loss[loss=0.04723, simple_loss=0.06307, pruned_loss=0.006815, audio_tagging_loss=0.008881, over 14810.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08928, pruned_loss=0.01217, audio_tagging_loss=0.008706, over 3048384.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:28:53,075 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:29:09,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3558080.0, ans=0.0 2023-11-26 20:29:09,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2023-11-26 20:29:10,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3558080.0, ans=0.2 2023-11-26 20:29:16,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3558146.6666666665, ans=0.125 2023-11-26 20:29:34,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-26 20:29:42,059 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-26 20:29:46,206 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4700, loss[loss=0.05506, simple_loss=0.07326, pruned_loss=0.009631, audio_tagging_loss=0.008801, over 16261.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.0894, pruned_loss=0.01199, audio_tagging_loss=0.008792, over 3053057.81 frames. ], batch size: 64, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:29:50,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.876e+01 9.435e+01 1.008e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 20:29:51,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3558346.6666666665, ans=0.125 2023-11-26 20:30:20,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3558546.6666666665, ans=0.1 2023-11-26 20:30:35,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3558613.3333333335, ans=0.125 2023-11-26 20:30:36,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3558613.3333333335, ans=0.1 2023-11-26 20:30:36,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3558613.3333333335, ans=0.2 2023-11-26 20:30:36,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2023-11-26 20:30:37,101 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-26 20:30:41,606 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4750, loss[loss=0.04823, simple_loss=0.06265, pruned_loss=0.005667, audio_tagging_loss=0.01124, over 15719.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09024, pruned_loss=0.01202, audio_tagging_loss=0.008895, over 3049943.45 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:30:48,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3558680.0, ans=0.125 2023-11-26 20:30:57,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3558746.6666666665, ans=0.0 2023-11-26 20:31:04,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=22.5 2023-11-26 20:31:33,075 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-26 20:31:34,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3558946.6666666665, ans=0.2 2023-11-26 20:31:38,362 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4800, loss[loss=0.06637, simple_loss=0.09169, pruned_loss=0.01273, audio_tagging_loss=0.007792, over 15561.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09019, pruned_loss=0.01204, audio_tagging_loss=0.008941, over 3054144.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:31:42,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.037e+01 9.476e+01 1.008e+02 1.757e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 20:31:56,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3559080.0, ans=0.07 2023-11-26 20:32:11,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3559213.3333333335, ans=0.125 2023-11-26 20:32:19,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3559213.3333333335, ans=0.1 2023-11-26 20:32:29,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-26 20:32:31,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2023-11-26 20:32:34,183 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4850, loss[loss=0.07729, simple_loss=0.1122, pruned_loss=0.01294, audio_tagging_loss=0.008235, over 14649.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09014, pruned_loss=0.01202, audio_tagging_loss=0.008895, over 3057783.32 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:32:42,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3559346.6666666665, ans=0.0 2023-11-26 20:32:43,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3559413.3333333335, ans=0.05 2023-11-26 20:33:07,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3559546.6666666665, ans=0.0 2023-11-26 20:33:25,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-26 20:33:28,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.81 vs. limit=10.0 2023-11-26 20:33:29,631 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4900, loss[loss=0.06025, simple_loss=0.08834, pruned_loss=0.009447, audio_tagging_loss=0.006632, over 15371.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09009, pruned_loss=0.01207, audio_tagging_loss=0.008891, over 3055532.44 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:33:34,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.987e+01 9.681e+01 1.025e+02 1.624e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 20:33:46,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3559746.6666666665, ans=0.125 2023-11-26 20:33:50,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3559746.6666666665, ans=0.1 2023-11-26 20:34:13,154 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:34:14,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3559946.6666666665, ans=0.125 2023-11-26 20:34:20,403 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-26 20:34:20,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3559946.6666666665, ans=0.0 2023-11-26 20:34:25,476 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4950, loss[loss=0.0578, simple_loss=0.07981, pruned_loss=0.008962, audio_tagging_loss=0.008931, over 15416.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09006, pruned_loss=0.01223, audio_tagging_loss=0.00882, over 3048305.71 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:34:28,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-26 20:34:37,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3560080.0, ans=0.07 2023-11-26 20:34:51,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3560146.6666666665, ans=0.2 2023-11-26 20:34:54,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3560146.6666666665, ans=0.0 2023-11-26 20:35:01,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3560213.3333333335, ans=0.1 2023-11-26 20:35:16,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-26 20:35:20,443 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5000, loss[loss=0.07028, simple_loss=0.09022, pruned_loss=0.0154, audio_tagging_loss=0.009762, over 15674.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0897, pruned_loss=0.01224, audio_tagging_loss=0.008633, over 3036820.42 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:35:21,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3560346.6666666665, ans=0.125 2023-11-26 20:35:26,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.102e+01 9.598e+01 1.044e+02 1.473e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 20:35:31,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3560413.3333333335, ans=0.125 2023-11-26 20:35:31,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3560413.3333333335, ans=0.0 2023-11-26 20:35:32,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3560413.3333333335, ans=0.125 2023-11-26 20:35:39,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3560413.3333333335, ans=0.125 2023-11-26 20:35:50,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3560480.0, ans=0.125 2023-11-26 20:35:50,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3560480.0, ans=0.2 2023-11-26 20:35:59,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2023-11-26 20:36:05,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3560613.3333333335, ans=0.125 2023-11-26 20:36:12,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-26 20:36:13,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3560613.3333333335, ans=0.1 2023-11-26 20:36:16,214 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5050, loss[loss=0.06149, simple_loss=0.08453, pruned_loss=0.01026, audio_tagging_loss=0.008962, over 15383.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08965, pruned_loss=0.01226, audio_tagging_loss=0.008444, over 3032586.43 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:36:32,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560746.6666666665, ans=0.1 2023-11-26 20:37:07,431 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-26 20:37:12,199 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5100, loss[loss=0.06116, simple_loss=0.08173, pruned_loss=0.009525, audio_tagging_loss=0.01077, over 15821.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08973, pruned_loss=0.01226, audio_tagging_loss=0.008452, over 3036039.01 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:37:18,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.922e+01 9.558e+01 1.035e+02 1.358e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 20:37:25,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3561080.0, ans=0.05 2023-11-26 20:37:27,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3561080.0, ans=0.0 2023-11-26 20:37:51,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3561213.3333333335, ans=0.0 2023-11-26 20:38:04,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-26 20:38:07,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3561280.0, ans=0.1 2023-11-26 20:38:09,084 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5150, loss[loss=0.08927, simple_loss=0.1373, pruned_loss=0.01585, audio_tagging_loss=0.004791, over 15712.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08891, pruned_loss=0.01207, audio_tagging_loss=0.008554, over 3036659.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:38:42,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3561546.6666666665, ans=0.125 2023-11-26 20:38:42,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.24 vs. limit=10.0 2023-11-26 20:38:47,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3561546.6666666665, ans=0.125 2023-11-26 20:39:00,368 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-26 20:39:05,070 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5200, loss[loss=0.05967, simple_loss=0.07049, pruned_loss=0.01517, audio_tagging_loss=0.009247, over 15792.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08794, pruned_loss=0.01189, audio_tagging_loss=0.008561, over 3031533.86 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:39:05,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3561680.0, ans=0.1 2023-11-26 20:39:08,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3561680.0, ans=0.2 2023-11-26 20:39:10,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.817e+01 9.257e+01 9.950e+01 1.216e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 20:39:15,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3561746.6666666665, ans=0.125 2023-11-26 20:39:18,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3561746.6666666665, ans=0.125 2023-11-26 20:39:50,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3561946.6666666665, ans=0.125 2023-11-26 20:39:55,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3561946.6666666665, ans=0.125 2023-11-26 20:39:56,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-26 20:40:00,347 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5250, loss[loss=0.07034, simple_loss=0.09882, pruned_loss=0.01367, audio_tagging_loss=0.007259, over 15757.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08803, pruned_loss=0.01187, audio_tagging_loss=0.008619, over 3036862.37 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:40:02,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3562013.3333333335, ans=0.125 2023-11-26 20:40:09,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3562013.3333333335, ans=0.125 2023-11-26 20:40:22,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3562146.6666666665, ans=0.125 2023-11-26 20:40:42,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3562213.3333333335, ans=0.0 2023-11-26 20:40:46,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3562280.0, ans=15.0 2023-11-26 20:40:53,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-26 20:40:57,746 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5300, loss[loss=0.06655, simple_loss=0.09784, pruned_loss=0.01174, audio_tagging_loss=0.005894, over 15926.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08861, pruned_loss=0.01191, audio_tagging_loss=0.008501, over 3036972.74 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:41:02,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 8.749e+01 9.362e+01 1.021e+02 1.179e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 20:41:17,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3562413.3333333335, ans=0.125 2023-11-26 20:41:25,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3562480.0, ans=0.125 2023-11-26 20:41:32,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3562546.6666666665, ans=0.0 2023-11-26 20:41:33,210 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:41:35,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3562546.6666666665, ans=0.1 2023-11-26 20:41:47,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2023-11-26 20:41:48,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-26 20:41:53,342 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5350, loss[loss=0.05981, simple_loss=0.07644, pruned_loss=0.01361, audio_tagging_loss=0.007977, over 15301.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08916, pruned_loss=0.01208, audio_tagging_loss=0.008515, over 3046376.51 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:42:23,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562813.3333333335, ans=0.1 2023-11-26 20:42:28,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3562880.0, ans=0.1 2023-11-26 20:42:39,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3562946.6666666665, ans=0.1 2023-11-26 20:42:45,104 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-26 20:42:49,305 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5400, loss[loss=0.05105, simple_loss=0.06985, pruned_loss=0.007462, audio_tagging_loss=0.008664, over 16053.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.0889, pruned_loss=0.012, audio_tagging_loss=0.008528, over 3041002.14 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:42:56,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.834e+01 9.520e+01 1.043e+02 1.175e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:42:58,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-26 20:43:12,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3563146.6666666665, ans=0.125 2023-11-26 20:43:37,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3563280.0, ans=0.125 2023-11-26 20:43:41,987 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-26 20:43:46,141 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5450, loss[loss=0.06408, simple_loss=0.08771, pruned_loss=0.01287, audio_tagging_loss=0.00736, over 14450.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08983, pruned_loss=0.01201, audio_tagging_loss=0.008589, over 3049425.32 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:43:51,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3563346.6666666665, ans=0.125 2023-11-26 20:43:53,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3563346.6666666665, ans=0.125 2023-11-26 20:44:00,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3563413.3333333335, ans=12.0 2023-11-26 20:44:02,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-26 20:44:33,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3563613.3333333335, ans=0.0 2023-11-26 20:44:37,382 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-26 20:44:38,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-26 20:44:41,539 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5500, loss[loss=0.07231, simple_loss=0.09543, pruned_loss=0.01613, audio_tagging_loss=0.008469, over 15116.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09036, pruned_loss=0.01211, audio_tagging_loss=0.008661, over 3051615.43 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:44:47,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.118e+01 9.897e+01 1.074e+02 1.555e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-26 20:45:18,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=12.0 2023-11-26 20:45:32,786 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-26 20:45:35,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3563946.6666666665, ans=0.0 2023-11-26 20:45:37,295 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5550, loss[loss=0.05648, simple_loss=0.07623, pruned_loss=0.009461, audio_tagging_loss=0.008907, over 14356.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09002, pruned_loss=0.01212, audio_tagging_loss=0.00874, over 3053317.60 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:45:41,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-26 20:45:46,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3564013.3333333335, ans=0.125 2023-11-26 20:45:57,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3564080.0, ans=0.95 2023-11-26 20:46:17,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3564213.3333333335, ans=0.125 2023-11-26 20:46:22,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3564280.0, ans=0.0 2023-11-26 20:46:22,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3564280.0, ans=0.125 2023-11-26 20:46:29,300 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-26 20:46:34,584 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5600, loss[loss=0.06206, simple_loss=0.08772, pruned_loss=0.01029, audio_tagging_loss=0.007909, over 16710.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09027, pruned_loss=0.01218, audio_tagging_loss=0.008783, over 3041374.40 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:46:40,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.848e+01 9.516e+01 1.047e+02 1.275e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 20:46:51,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3564413.3333333335, ans=0.0 2023-11-26 20:46:58,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3564480.0, ans=0.125 2023-11-26 20:47:14,940 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:47:23,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-11-26 20:47:25,528 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-26 20:47:29,694 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5650, loss[loss=0.06754, simple_loss=0.0813, pruned_loss=0.01463, audio_tagging_loss=0.01227, over 14605.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08971, pruned_loss=0.01222, audio_tagging_loss=0.008929, over 3044811.30 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:47:31,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3564680.0, ans=0.125 2023-11-26 20:47:36,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3564680.0, ans=0.1 2023-11-26 20:47:45,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3564746.6666666665, ans=0.125 2023-11-26 20:48:09,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3564880.0, ans=0.125 2023-11-26 20:48:15,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3564946.6666666665, ans=0.125 2023-11-26 20:48:20,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3564946.6666666665, ans=0.0 2023-11-26 20:48:21,028 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-26 20:48:21,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3564946.6666666665, ans=0.2 2023-11-26 20:48:22,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3564946.6666666665, ans=0.125 2023-11-26 20:48:25,199 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5700, loss[loss=0.07817, simple_loss=0.1026, pruned_loss=0.01769, audio_tagging_loss=0.009174, over 14855.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09028, pruned_loss=0.01222, audio_tagging_loss=0.008899, over 3048762.13 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:48:29,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3565013.3333333335, ans=0.125 2023-11-26 20:48:33,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.707e+01 9.299e+01 1.009e+02 1.151e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 20:48:43,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3565080.0, ans=0.125 2023-11-26 20:48:50,880 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:48:57,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3565146.6666666665, ans=0.1 2023-11-26 20:49:06,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3565213.3333333335, ans=0.2 2023-11-26 20:49:16,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-26 20:49:19,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2023-11-26 20:49:20,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3565346.6666666665, ans=0.0 2023-11-26 20:49:21,922 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5750, loss[loss=0.05768, simple_loss=0.07543, pruned_loss=0.009026, audio_tagging_loss=0.01094, over 14530.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08942, pruned_loss=0.01202, audio_tagging_loss=0.008821, over 3054090.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:49:43,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3565480.0, ans=0.05 2023-11-26 20:49:45,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3565480.0, ans=0.125 2023-11-26 20:49:50,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3565480.0, ans=0.125 2023-11-26 20:50:01,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3565546.6666666665, ans=0.1 2023-11-26 20:50:12,606 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-26 20:50:16,757 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5800, loss[loss=0.04705, simple_loss=0.06179, pruned_loss=0.006659, audio_tagging_loss=0.009494, over 13960.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0885, pruned_loss=0.01183, audio_tagging_loss=0.008716, over 3046089.49 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:50:23,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565680.0, ans=0.1 2023-11-26 20:50:24,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.906e+01 9.529e+01 1.040e+02 1.512e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:50:36,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2023-11-26 20:51:05,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3565946.6666666665, ans=0.0 2023-11-26 20:51:07,138 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-26 20:51:08,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3565946.6666666665, ans=0.125 2023-11-26 20:51:11,323 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5850, loss[loss=0.05988, simple_loss=0.08175, pruned_loss=0.01083, audio_tagging_loss=0.008183, over 15420.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08837, pruned_loss=0.01176, audio_tagging_loss=0.008688, over 3042423.25 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:51:12,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3566013.3333333335, ans=0.1 2023-11-26 20:51:15,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566013.3333333335, ans=0.1 2023-11-26 20:51:19,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3566013.3333333335, ans=0.2 2023-11-26 20:51:23,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3566080.0, ans=0.125 2023-11-26 20:51:27,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2023-11-26 20:51:29,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.22 vs. limit=12.0 2023-11-26 20:51:36,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3566146.6666666665, ans=0.0 2023-11-26 20:51:43,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566213.3333333335, ans=0.1 2023-11-26 20:51:53,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3566280.0, ans=0.0 2023-11-26 20:51:58,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3566280.0, ans=0.0 2023-11-26 20:52:01,262 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-26 20:52:05,960 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5900, loss[loss=0.06688, simple_loss=0.1001, pruned_loss=0.01095, audio_tagging_loss=0.005864, over 15358.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0887, pruned_loss=0.0118, audio_tagging_loss=0.008617, over 3047988.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:52:07,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3566346.6666666665, ans=0.07 2023-11-26 20:52:14,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.767e+01 9.381e+01 1.012e+02 1.422e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 20:52:15,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-26 20:52:27,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3566480.0, ans=0.0 2023-11-26 20:52:38,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3566546.6666666665, ans=0.125 2023-11-26 20:52:47,270 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:52:56,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566613.3333333335, ans=0.1 2023-11-26 20:52:57,793 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-26 20:53:02,266 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5950, loss[loss=0.03687, simple_loss=0.04861, pruned_loss=0.003557, audio_tagging_loss=0.009006, over 14692.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08818, pruned_loss=0.01157, audio_tagging_loss=0.008681, over 3047479.44 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:53:06,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566680.0, ans=0.1 2023-11-26 20:53:21,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566746.6666666665, ans=0.1 2023-11-26 20:53:29,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-26 20:53:53,245 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-26 20:53:57,414 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6000, loss[loss=0.05253, simple_loss=0.07628, pruned_loss=0.005982, audio_tagging_loss=0.008403, over 14425.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08839, pruned_loss=0.01145, audio_tagging_loss=0.008601, over 3045316.03 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:53:57,414 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 20:54:29,552 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05766, simple_loss=0.05058, pruned_loss=0.005348, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 20:54:29,553 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 20:54:30,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3567013.3333333335, ans=0.125 2023-11-26 20:54:37,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.765e+01 9.407e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 20:55:09,177 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:55:20,928 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-26 20:55:25,119 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6050, loss[loss=0.06653, simple_loss=0.09627, pruned_loss=0.01052, audio_tagging_loss=0.007873, over 15831.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08926, pruned_loss=0.01174, audio_tagging_loss=0.008493, over 3045991.41 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:55:47,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3567480.0, ans=0.125 2023-11-26 20:56:03,918 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:56:11,325 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:56:16,562 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-26 20:56:20,757 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6100, loss[loss=0.07604, simple_loss=0.1036, pruned_loss=0.01454, audio_tagging_loss=0.009686, over 14842.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08874, pruned_loss=0.01172, audio_tagging_loss=0.008524, over 3046402.24 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:56:28,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.965e+01 9.690e+01 1.035e+02 1.368e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 20:56:37,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3567746.6666666665, ans=0.0 2023-11-26 20:56:38,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-26 20:56:57,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3567880.0, ans=0.125 2023-11-26 20:57:08,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3567946.6666666665, ans=0.0 2023-11-26 20:57:11,463 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-26 20:57:15,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3568013.3333333335, ans=0.0 2023-11-26 20:57:17,036 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6150, loss[loss=0.07142, simple_loss=0.1002, pruned_loss=0.0152, audio_tagging_loss=0.006114, over 14898.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08847, pruned_loss=0.01182, audio_tagging_loss=0.008556, over 3043442.42 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:57:19,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-11-26 20:57:21,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3568013.3333333335, ans=0.2 2023-11-26 20:57:32,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3568080.0, ans=0.0 2023-11-26 20:57:54,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3568213.3333333335, ans=15.0 2023-11-26 20:58:08,728 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-26 20:58:13,472 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6200, loss[loss=0.04164, simple_loss=0.05327, pruned_loss=0.005972, audio_tagging_loss=0.009039, over 15229.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08838, pruned_loss=0.01186, audio_tagging_loss=0.008706, over 3043756.57 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:58:19,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3568346.6666666665, ans=0.125 2023-11-26 20:58:20,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3568346.6666666665, ans=0.125 2023-11-26 20:58:20,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.899e+01 9.421e+01 1.012e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:58:21,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3568346.6666666665, ans=0.125 2023-11-26 20:58:24,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-11-26 20:58:32,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-26 20:58:38,350 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:58:53,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3568546.6666666665, ans=0.125 2023-11-26 20:59:04,005 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-26 20:59:07,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3568680.0, ans=0.125 2023-11-26 20:59:07,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2023-11-26 20:59:07,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-26 20:59:08,233 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6250, loss[loss=0.05464, simple_loss=0.07355, pruned_loss=0.01059, audio_tagging_loss=0.007276, over 14362.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08714, pruned_loss=0.01184, audio_tagging_loss=0.00885, over 3040734.89 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:59:10,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3568680.0, ans=0.125 2023-11-26 20:59:22,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-26 20:59:25,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3568746.6666666665, ans=0.0 2023-11-26 20:59:25,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3568746.6666666665, ans=10.0 2023-11-26 20:59:28,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3568746.6666666665, ans=0.0 2023-11-26 20:59:31,552 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:59:44,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3568880.0, ans=0.0 2023-11-26 20:59:52,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3568946.6666666665, ans=0.2 2023-11-26 20:59:52,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3568946.6666666665, ans=0.125 2023-11-26 20:59:54,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3568946.6666666665, ans=0.125 2023-11-26 20:59:54,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-26 20:59:57,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3568946.6666666665, ans=0.125 2023-11-26 20:59:58,306 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-26 21:00:02,489 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6300, loss[loss=0.06405, simple_loss=0.08697, pruned_loss=0.01171, audio_tagging_loss=0.00885, over 14612.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08869, pruned_loss=0.01202, audio_tagging_loss=0.008854, over 3045327.98 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:00:07,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=22.5 2023-11-26 21:00:07,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3569013.3333333335, ans=0.2 2023-11-26 21:00:08,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3569013.3333333335, ans=0.5 2023-11-26 21:00:12,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.840e+01 9.586e+01 1.026e+02 1.198e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:00:16,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=22.5 2023-11-26 21:00:40,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-26 21:00:54,198 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-26 21:00:58,581 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6350, loss[loss=0.06331, simple_loss=0.08484, pruned_loss=0.01091, audio_tagging_loss=0.009978, over 15996.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08854, pruned_loss=0.01196, audio_tagging_loss=0.008843, over 3048160.82 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:01:04,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3569346.6666666665, ans=0.0 2023-11-26 21:01:09,069 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:01:17,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-11-26 21:01:19,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3569480.0, ans=0.125 2023-11-26 21:01:49,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-26 21:01:53,953 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6400, loss[loss=0.06689, simple_loss=0.09274, pruned_loss=0.01375, audio_tagging_loss=0.006767, over 14791.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08866, pruned_loss=0.01195, audio_tagging_loss=0.00887, over 3049815.63 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:01:54,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3569680.0, ans=0.125 2023-11-26 21:02:02,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.580e+01 9.385e+01 1.005e+02 1.222e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:02:08,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3569746.6666666665, ans=0.0 2023-11-26 21:02:15,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2023-11-26 21:02:30,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3569880.0, ans=0.0 2023-11-26 21:02:32,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3569880.0, ans=0.125 2023-11-26 21:02:39,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3569946.6666666665, ans=0.125 2023-11-26 21:02:41,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3569946.6666666665, ans=0.1 2023-11-26 21:02:44,668 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-26 21:02:48,859 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6450, loss[loss=0.05142, simple_loss=0.07187, pruned_loss=0.005539, audio_tagging_loss=0.009949, over 16033.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08876, pruned_loss=0.01192, audio_tagging_loss=0.008935, over 3046125.15 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:02:49,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3570013.3333333335, ans=0.0 2023-11-26 21:02:59,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3570080.0, ans=0.0 2023-11-26 21:03:01,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3570080.0, ans=0.1 2023-11-26 21:03:08,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3570080.0, ans=0.125 2023-11-26 21:03:11,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-11-26 21:03:16,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=22.5 2023-11-26 21:03:27,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.77 vs. limit=10.0 2023-11-26 21:03:40,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-26 21:03:44,899 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6500, loss[loss=0.06032, simple_loss=0.08538, pruned_loss=0.009958, audio_tagging_loss=0.007675, over 15140.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08881, pruned_loss=0.01201, audio_tagging_loss=0.008927, over 3044691.68 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:03:53,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.670e+01 9.516e+01 1.047e+02 1.238e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 21:03:59,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3570413.3333333335, ans=0.2 2023-11-26 21:04:12,266 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:04:22,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3570546.6666666665, ans=0.0 2023-11-26 21:04:24,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3570546.6666666665, ans=0.0 2023-11-26 21:04:27,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3570546.6666666665, ans=0.125 2023-11-26 21:04:35,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-26 21:04:39,830 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6550, loss[loss=0.06867, simple_loss=0.09034, pruned_loss=0.01575, audio_tagging_loss=0.007757, over 15402.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08905, pruned_loss=0.01209, audio_tagging_loss=0.008868, over 3045935.37 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:05:01,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-26 21:05:13,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3570880.0, ans=0.125 2023-11-26 21:05:14,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3570880.0, ans=0.0 2023-11-26 21:05:21,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.61 vs. limit=22.5 2023-11-26 21:05:31,087 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-26 21:05:32,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3570946.6666666665, ans=0.0 2023-11-26 21:05:35,375 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6600, loss[loss=0.06697, simple_loss=0.08821, pruned_loss=0.01512, audio_tagging_loss=0.007744, over 14731.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08884, pruned_loss=0.01197, audio_tagging_loss=0.008744, over 3039470.42 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:05:44,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.935e+01 9.455e+01 1.019e+02 1.266e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 21:05:50,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=3571080.0, ans=12.0 2023-11-26 21:05:53,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-26 21:06:21,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3571280.0, ans=0.125 2023-11-26 21:06:26,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-26 21:06:30,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3571346.6666666665, ans=0.125 2023-11-26 21:06:30,991 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6650, loss[loss=0.05924, simple_loss=0.08556, pruned_loss=0.008383, audio_tagging_loss=0.008077, over 15397.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08934, pruned_loss=0.01209, audio_tagging_loss=0.00873, over 3037308.46 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:06:34,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3571346.6666666665, ans=0.125 2023-11-26 21:06:42,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3571413.3333333335, ans=0.125 2023-11-26 21:06:43,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-26 21:06:46,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3571413.3333333335, ans=0.2 2023-11-26 21:06:46,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-26 21:06:50,295 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:06:58,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3571480.0, ans=0.125 2023-11-26 21:07:04,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2023-11-26 21:07:09,573 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:07:13,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3571613.3333333335, ans=0.1 2023-11-26 21:07:19,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3571613.3333333335, ans=0.125 2023-11-26 21:07:21,153 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-26 21:07:25,285 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6700, loss[loss=0.04228, simple_loss=0.05374, pruned_loss=0.006912, audio_tagging_loss=0.008499, over 15121.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08964, pruned_loss=0.01218, audio_tagging_loss=0.008625, over 3032664.47 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:07:34,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.689e+01 9.559e+01 1.023e+02 3.616e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:08:01,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3571880.0, ans=0.125 2023-11-26 21:08:15,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-26 21:08:16,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2023-11-26 21:08:20,251 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6750, loss[loss=0.07416, simple_loss=0.1031, pruned_loss=0.01576, audio_tagging_loss=0.006873, over 15203.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08965, pruned_loss=0.01211, audio_tagging_loss=0.008586, over 3030347.80 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:08:32,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3572080.0, ans=0.125 2023-11-26 21:08:35,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2023-11-26 21:09:10,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3572280.0, ans=0.1 2023-11-26 21:09:11,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-26 21:09:16,569 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6800, loss[loss=0.03917, simple_loss=0.05061, pruned_loss=0.005492, audio_tagging_loss=0.008377, over 15947.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08963, pruned_loss=0.01216, audio_tagging_loss=0.008515, over 3032876.45 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:09:23,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3572346.6666666665, ans=0.1 2023-11-26 21:09:26,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.420e+01 1.023e+02 1.274e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 21:09:27,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3572413.3333333335, ans=0.125 2023-11-26 21:09:48,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3572546.6666666665, ans=0.125 2023-11-26 21:09:59,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3572613.3333333335, ans=0.0 2023-11-26 21:10:07,128 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-26 21:10:11,384 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6850, loss[loss=0.07776, simple_loss=0.115, pruned_loss=0.01281, audio_tagging_loss=0.007436, over 15353.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.0906, pruned_loss=0.0122, audio_tagging_loss=0.008426, over 3034689.70 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:10:15,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3572680.0, ans=0.125 2023-11-26 21:10:41,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3572813.3333333335, ans=0.0 2023-11-26 21:10:43,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3572813.3333333335, ans=0.1 2023-11-26 21:10:51,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3572880.0, ans=0.1 2023-11-26 21:11:02,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-26 21:11:06,890 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6900, loss[loss=0.06823, simple_loss=0.09406, pruned_loss=0.01248, audio_tagging_loss=0.008719, over 15226.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09102, pruned_loss=0.01211, audio_tagging_loss=0.008419, over 3040789.50 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:11:18,670 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.747e+01 9.465e+01 1.018e+02 1.501e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:11:25,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3573080.0, ans=0.2 2023-11-26 21:11:31,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3573146.6666666665, ans=0.2 2023-11-26 21:11:36,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3573146.6666666665, ans=0.2 2023-11-26 21:11:38,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-11-26 21:11:44,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3573213.3333333335, ans=0.0 2023-11-26 21:11:50,321 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:11:55,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-26 21:11:57,691 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-26 21:12:05,242 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6950, loss[loss=0.06024, simple_loss=0.08627, pruned_loss=0.01022, audio_tagging_loss=0.006894, over 15895.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09062, pruned_loss=0.0121, audio_tagging_loss=0.008493, over 3049704.49 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:12:05,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3573346.6666666665, ans=0.125 2023-11-26 21:12:11,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3573346.6666666665, ans=0.125 2023-11-26 21:12:12,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3573346.6666666665, ans=0.2 2023-11-26 21:12:16,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3573413.3333333335, ans=0.0 2023-11-26 21:12:19,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3573413.3333333335, ans=0.1 2023-11-26 21:12:23,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3573413.3333333335, ans=0.2 2023-11-26 21:12:25,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3573413.3333333335, ans=0.125 2023-11-26 21:12:56,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-26 21:13:00,999 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7000, loss[loss=0.05643, simple_loss=0.07277, pruned_loss=0.008787, audio_tagging_loss=0.01125, over 14560.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09035, pruned_loss=0.0121, audio_tagging_loss=0.008462, over 3052954.24 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:13:12,670 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.901e+01 9.470e+01 1.019e+02 1.225e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 21:13:12,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3573746.6666666665, ans=0.0 2023-11-26 21:13:24,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3573813.3333333335, ans=0.07 2023-11-26 21:13:32,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3573813.3333333335, ans=0.125 2023-11-26 21:13:38,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3573880.0, ans=0.125 2023-11-26 21:13:38,322 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:13:44,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3573946.6666666665, ans=0.125 2023-11-26 21:13:51,865 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-26 21:13:56,031 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7050, loss[loss=0.0547, simple_loss=0.06278, pruned_loss=0.01197, audio_tagging_loss=0.01134, over 15715.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08883, pruned_loss=0.01199, audio_tagging_loss=0.008584, over 3046750.78 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:14:05,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3574080.0, ans=0.125 2023-11-26 21:14:21,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3574146.6666666665, ans=10.0 2023-11-26 21:14:22,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2023-11-26 21:14:32,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3574213.3333333335, ans=0.125 2023-11-26 21:14:39,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.86 vs. limit=22.5 2023-11-26 21:14:46,447 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-26 21:14:48,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3574280.0, ans=0.125 2023-11-26 21:14:51,232 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7100, loss[loss=0.09146, simple_loss=0.1379, pruned_loss=0.01737, audio_tagging_loss=0.005125, over 16149.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08915, pruned_loss=0.0122, audio_tagging_loss=0.008665, over 3054594.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:15:04,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.863e+01 9.458e+01 1.036e+02 1.512e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 21:15:06,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3574413.3333333335, ans=0.125 2023-11-26 21:15:10,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-26 21:15:15,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3574480.0, ans=0.05 2023-11-26 21:15:36,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3574613.3333333335, ans=0.1 2023-11-26 21:15:36,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-11-26 21:15:43,253 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-26 21:15:47,682 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7150, loss[loss=0.05551, simple_loss=0.06992, pruned_loss=0.01, audio_tagging_loss=0.01055, over 15190.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08962, pruned_loss=0.01235, audio_tagging_loss=0.008748, over 3051207.19 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:15:49,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3574680.0, ans=0.2 2023-11-26 21:15:57,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3574746.6666666665, ans=0.0 2023-11-26 21:16:10,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-26 21:16:31,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3574946.6666666665, ans=0.125 2023-11-26 21:16:37,854 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-26 21:16:42,033 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7200, loss[loss=0.06623, simple_loss=0.08926, pruned_loss=0.01133, audio_tagging_loss=0.01027, over 15087.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08993, pruned_loss=0.01232, audio_tagging_loss=0.008797, over 3047824.57 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:16:42,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2023-11-26 21:16:49,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3575013.3333333335, ans=0.2 2023-11-26 21:16:53,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.982e+01 9.532e+01 1.041e+02 1.325e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 21:17:01,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3575080.0, ans=0.125 2023-11-26 21:17:06,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2023-11-26 21:17:14,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.64 vs. limit=12.0 2023-11-26 21:17:32,469 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-26 21:17:36,680 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7250, loss[loss=0.07088, simple_loss=0.09956, pruned_loss=0.01378, audio_tagging_loss=0.007321, over 13533.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09014, pruned_loss=0.01244, audio_tagging_loss=0.008865, over 3050643.24 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:17:51,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3575413.3333333335, ans=0.125 2023-11-26 21:18:05,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3575480.0, ans=0.0 2023-11-26 21:18:12,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3575546.6666666665, ans=0.125 2023-11-26 21:18:28,385 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-26 21:18:28,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3575613.3333333335, ans=0.09899494936611666 2023-11-26 21:18:30,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3575613.3333333335, ans=0.0 2023-11-26 21:18:33,115 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7300, loss[loss=0.05082, simple_loss=0.0678, pruned_loss=0.008562, audio_tagging_loss=0.00836, over 16072.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08947, pruned_loss=0.01238, audio_tagging_loss=0.00888, over 3043311.10 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:18:36,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3575680.0, ans=0.07 2023-11-26 21:18:38,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3575680.0, ans=0.2 2023-11-26 21:18:41,907 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:18:45,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.748e+01 9.464e+01 1.022e+02 1.262e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:19:09,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3575880.0, ans=0.2 2023-11-26 21:19:20,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3575946.6666666665, ans=0.125 2023-11-26 21:19:20,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3575946.6666666665, ans=0.125 2023-11-26 21:19:23,604 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-26 21:19:28,018 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7350, loss[loss=0.04546, simple_loss=0.05742, pruned_loss=0.00745, audio_tagging_loss=0.009306, over 16411.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08882, pruned_loss=0.01217, audio_tagging_loss=0.008756, over 3043713.49 frames. ], batch size: 64, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:19:37,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3576080.0, ans=0.1 2023-11-26 21:19:40,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3576080.0, ans=0.125 2023-11-26 21:19:49,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2023-11-26 21:19:50,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3576146.6666666665, ans=0.2 2023-11-26 21:20:18,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-26 21:20:22,638 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7400, loss[loss=0.06863, simple_loss=0.102, pruned_loss=0.01162, audio_tagging_loss=0.006007, over 15111.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08828, pruned_loss=0.01214, audio_tagging_loss=0.008643, over 3045113.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:20:26,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3576346.6666666665, ans=0.1 2023-11-26 21:20:36,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.979e+01 9.560e+01 1.029e+02 2.303e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:20:41,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3576413.3333333335, ans=0.2 2023-11-26 21:20:49,094 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:20:52,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-26 21:21:03,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3576546.6666666665, ans=0.125 2023-11-26 21:21:14,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-26 21:21:14,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3576613.3333333335, ans=0.0 2023-11-26 21:21:17,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3576680.0, ans=0.125 2023-11-26 21:21:18,751 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7450, loss[loss=0.06896, simple_loss=0.1022, pruned_loss=0.0101, audio_tagging_loss=0.007751, over 15909.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08843, pruned_loss=0.01212, audio_tagging_loss=0.008637, over 3044479.17 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:21:25,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3576680.0, ans=0.125 2023-11-26 21:21:28,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-26 21:21:40,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3576813.3333333335, ans=0.0 2023-11-26 21:22:09,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-26 21:22:10,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576946.6666666665, ans=0.1 2023-11-26 21:22:10,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3576946.6666666665, ans=0.0 2023-11-26 21:22:13,913 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7500, loss[loss=0.06842, simple_loss=0.08527, pruned_loss=0.01466, audio_tagging_loss=0.01113, over 13748.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08899, pruned_loss=0.01229, audio_tagging_loss=0.008546, over 3048531.91 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:22:16,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-26 21:22:18,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3577013.3333333335, ans=0.0 2023-11-26 21:22:26,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.830e+01 9.434e+01 1.016e+02 1.615e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 21:22:46,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3577213.3333333335, ans=0.125 2023-11-26 21:22:52,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3577213.3333333335, ans=0.1 2023-11-26 21:22:53,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3577213.3333333335, ans=0.125 2023-11-26 21:23:00,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-26 21:23:04,446 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-26 21:23:08,919 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7550, loss[loss=0.05765, simple_loss=0.0833, pruned_loss=0.009388, audio_tagging_loss=0.006609, over 14700.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08911, pruned_loss=0.01228, audio_tagging_loss=0.008509, over 3045387.29 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:23:11,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3577346.6666666665, ans=0.125 2023-11-26 21:23:21,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3577413.3333333335, ans=0.1 2023-11-26 21:23:23,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2023-11-26 21:23:30,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3577480.0, ans=0.125 2023-11-26 21:23:40,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-26 21:23:51,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3577546.6666666665, ans=0.2 2023-11-26 21:24:00,016 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-26 21:24:04,761 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7600, loss[loss=0.05366, simple_loss=0.07354, pruned_loss=0.00808, audio_tagging_loss=0.008811, over 14909.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0889, pruned_loss=0.01238, audio_tagging_loss=0.008525, over 3048984.20 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:24:17,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.794e+01 9.367e+01 9.817e+01 1.272e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:24:56,191 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-26 21:24:56,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3577946.6666666665, ans=0.0 2023-11-26 21:25:00,345 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7650, loss[loss=0.07036, simple_loss=0.1043, pruned_loss=0.01137, audio_tagging_loss=0.006824, over 16252.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.0886, pruned_loss=0.01219, audio_tagging_loss=0.008457, over 3040285.71 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:25:11,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3578080.0, ans=0.2 2023-11-26 21:25:19,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3578080.0, ans=0.125 2023-11-26 21:25:27,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3578146.6666666665, ans=0.0 2023-11-26 21:25:48,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3578280.0, ans=0.0 2023-11-26 21:25:52,124 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-26 21:25:56,379 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7700, loss[loss=0.06444, simple_loss=0.08309, pruned_loss=0.01284, audio_tagging_loss=0.01005, over 15340.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08945, pruned_loss=0.01222, audio_tagging_loss=0.008456, over 3035321.69 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:26:10,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.777e+01 9.451e+01 1.024e+02 1.236e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 21:26:12,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2023-11-26 21:26:14,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3578413.3333333335, ans=0.05 2023-11-26 21:26:34,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3578546.6666666665, ans=0.0 2023-11-26 21:26:41,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3578613.3333333335, ans=0.0 2023-11-26 21:26:43,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3578613.3333333335, ans=0.125 2023-11-26 21:26:44,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3578613.3333333335, ans=0.1 2023-11-26 21:26:47,864 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-26 21:26:52,885 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7750, loss[loss=0.09028, simple_loss=0.1271, pruned_loss=0.01848, audio_tagging_loss=0.008272, over 16086.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0891, pruned_loss=0.01226, audio_tagging_loss=0.008577, over 3039304.12 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:26:58,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2023-11-26 21:27:04,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3578746.6666666665, ans=0.04949747468305833 2023-11-26 21:27:07,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3578746.6666666665, ans=0.125 2023-11-26 21:27:44,479 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-26 21:27:46,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3578946.6666666665, ans=0.05 2023-11-26 21:27:48,634 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7800, loss[loss=0.07089, simple_loss=0.09375, pruned_loss=0.01387, audio_tagging_loss=0.01014, over 14473.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0887, pruned_loss=0.01199, audio_tagging_loss=0.008708, over 3042176.23 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:27:49,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3579013.3333333335, ans=0.0 2023-11-26 21:28:01,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 9.125e+01 9.673e+01 1.032e+02 1.227e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 21:28:12,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-26 21:28:14,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3579146.6666666665, ans=0.02 2023-11-26 21:28:15,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-26 21:28:25,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3579213.3333333335, ans=0.125 2023-11-26 21:28:32,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3579280.0, ans=0.0 2023-11-26 21:28:35,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3579280.0, ans=0.0 2023-11-26 21:28:39,501 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-26 21:28:44,316 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7850, loss[loss=0.1032, simple_loss=0.1474, pruned_loss=0.02025, audio_tagging_loss=0.009292, over 15764.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08944, pruned_loss=0.0122, audio_tagging_loss=0.008779, over 3037572.49 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:29:04,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3579413.3333333335, ans=0.0 2023-11-26 21:29:17,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-11-26 21:29:35,295 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-26 21:29:35,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3579613.3333333335, ans=0.2 2023-11-26 21:29:39,965 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7900, loss[loss=0.06319, simple_loss=0.08803, pruned_loss=0.01225, audio_tagging_loss=0.006922, over 15528.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08922, pruned_loss=0.01213, audio_tagging_loss=0.008823, over 3037807.74 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:29:47,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.65 vs. limit=10.0 2023-11-26 21:29:49,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3579680.0, ans=0.07 2023-11-26 21:29:50,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-26 21:29:53,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.961e+01 9.633e+01 1.012e+02 1.259e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 21:30:12,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3579880.0, ans=0.2 2023-11-26 21:30:25,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3579946.6666666665, ans=0.125 2023-11-26 21:30:32,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-26 21:30:36,640 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7950, loss[loss=0.06573, simple_loss=0.08468, pruned_loss=0.01386, audio_tagging_loss=0.009532, over 15987.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08931, pruned_loss=0.01216, audio_tagging_loss=0.008916, over 3045218.29 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:30:42,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3580013.3333333335, ans=0.0 2023-11-26 21:30:42,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-11-26 21:30:44,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3580013.3333333335, ans=0.0 2023-11-26 21:30:50,450 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:30:52,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-26 21:31:02,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3580146.6666666665, ans=0.0 2023-11-26 21:31:06,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2023-11-26 21:31:13,020 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:31:23,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=22.5 2023-11-26 21:31:27,710 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-26 21:31:29,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2023-11-26 21:31:31,827 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8000, loss[loss=0.06279, simple_loss=0.08353, pruned_loss=0.01153, audio_tagging_loss=0.009497, over 16050.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08925, pruned_loss=0.0121, audio_tagging_loss=0.008914, over 3043731.49 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:31:36,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3580346.6666666665, ans=0.1 2023-11-26 21:31:45,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.727e+01 9.223e+01 9.988e+01 1.687e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 21:31:46,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-26 21:31:49,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=22.5 2023-11-26 21:32:11,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3580546.6666666665, ans=15.0 2023-11-26 21:32:13,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3580546.6666666665, ans=0.125 2023-11-26 21:32:13,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3580546.6666666665, ans=0.0 2023-11-26 21:32:22,920 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-26 21:32:24,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3580613.3333333335, ans=0.125 2023-11-26 21:32:27,639 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8050, loss[loss=0.07615, simple_loss=0.1008, pruned_loss=0.0142, audio_tagging_loss=0.01156, over 15291.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08942, pruned_loss=0.01213, audio_tagging_loss=0.008962, over 3049244.34 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:32:43,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-11-26 21:32:57,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=12.0 2023-11-26 21:33:12,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2023-11-26 21:33:14,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3580946.6666666665, ans=15.0 2023-11-26 21:33:14,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-26 21:33:19,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-26 21:33:22,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-26 21:33:24,163 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8100, loss[loss=0.05655, simple_loss=0.06848, pruned_loss=0.0111, audio_tagging_loss=0.01121, over 14468.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08971, pruned_loss=0.0123, audio_tagging_loss=0.008858, over 3038562.36 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:33:28,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3581013.3333333335, ans=0.0 2023-11-26 21:33:36,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.942e+01 9.751e+01 1.046e+02 1.316e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 21:33:43,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3581080.0, ans=0.2 2023-11-26 21:34:15,192 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-26 21:34:19,667 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8150, loss[loss=0.07169, simple_loss=0.1062, pruned_loss=0.01136, audio_tagging_loss=0.007232, over 15383.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08976, pruned_loss=0.01242, audio_tagging_loss=0.008743, over 3047582.66 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:34:58,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3581546.6666666665, ans=0.2 2023-11-26 21:35:08,699 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:35:10,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-26 21:35:15,020 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8200, loss[loss=0.05809, simple_loss=0.07356, pruned_loss=0.009874, audio_tagging_loss=0.01144, over 14368.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08994, pruned_loss=0.01229, audio_tagging_loss=0.008626, over 3045990.57 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:35:17,763 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:35:23,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3581680.0, ans=0.0 2023-11-26 21:35:29,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.811e+01 9.586e+01 1.032e+02 1.518e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:35:55,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3581880.0, ans=10.0 2023-11-26 21:36:07,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3581946.6666666665, ans=0.125 2023-11-26 21:36:08,197 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-26 21:36:08,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3581946.6666666665, ans=0.2 2023-11-26 21:36:11,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3582013.3333333335, ans=0.1 2023-11-26 21:36:12,444 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8250, loss[loss=0.05466, simple_loss=0.06975, pruned_loss=0.006285, audio_tagging_loss=0.0135, over 15657.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09097, pruned_loss=0.01233, audio_tagging_loss=0.008547, over 3050696.36 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:36:19,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3582013.3333333335, ans=0.125 2023-11-26 21:36:43,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3582213.3333333335, ans=0.5 2023-11-26 21:37:03,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-26 21:37:07,514 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8300, loss[loss=0.05516, simple_loss=0.07705, pruned_loss=0.009791, audio_tagging_loss=0.006848, over 16506.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09086, pruned_loss=0.01214, audio_tagging_loss=0.008562, over 3056852.93 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:37:07,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3582346.6666666665, ans=0.125 2023-11-26 21:37:20,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.995e+01 9.587e+01 1.028e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:37:21,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.78 vs. limit=15.0 2023-11-26 21:37:26,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3582413.3333333335, ans=0.0 2023-11-26 21:37:35,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3582480.0, ans=0.5 2023-11-26 21:37:43,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-26 21:37:57,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3582613.3333333335, ans=0.0 2023-11-26 21:37:58,401 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-26 21:38:00,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3582613.3333333335, ans=0.5 2023-11-26 21:38:00,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3582613.3333333335, ans=0.0 2023-11-26 21:38:02,865 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8350, loss[loss=0.08852, simple_loss=0.1228, pruned_loss=0.02018, audio_tagging_loss=0.006921, over 15337.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09029, pruned_loss=0.01211, audio_tagging_loss=0.008484, over 3060258.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:38:13,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3582746.6666666665, ans=0.125 2023-11-26 21:38:26,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3582813.3333333335, ans=0.0 2023-11-26 21:38:30,582 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:38:32,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3582813.3333333335, ans=0.125 2023-11-26 21:38:43,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3582880.0, ans=0.1 2023-11-26 21:38:54,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-26 21:38:57,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3582946.6666666665, ans=0.0 2023-11-26 21:38:59,489 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8400, loss[loss=0.06795, simple_loss=0.1043, pruned_loss=0.009478, audio_tagging_loss=0.0063, over 16228.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09016, pruned_loss=0.01192, audio_tagging_loss=0.008525, over 3065258.54 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:39:11,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3583080.0, ans=0.07 2023-11-26 21:39:13,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.925e+01 9.429e+01 9.938e+01 1.352e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:39:22,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3583146.6666666665, ans=0.125 2023-11-26 21:39:27,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3583146.6666666665, ans=0.125 2023-11-26 21:39:33,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3583213.3333333335, ans=0.2 2023-11-26 21:39:50,058 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-26 21:39:54,179 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8450, loss[loss=0.05676, simple_loss=0.07388, pruned_loss=0.009871, audio_tagging_loss=0.009947, over 14335.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08976, pruned_loss=0.01194, audio_tagging_loss=0.008595, over 3055450.42 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:40:05,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3583413.3333333335, ans=0.0 2023-11-26 21:40:44,875 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-26 21:40:47,184 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:40:49,054 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8500, loss[loss=0.06863, simple_loss=0.09534, pruned_loss=0.01331, audio_tagging_loss=0.007658, over 16155.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08921, pruned_loss=0.01189, audio_tagging_loss=0.008666, over 3049603.89 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:40:52,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3583680.0, ans=0.07 2023-11-26 21:41:00,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-26 21:41:03,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-26 21:41:03,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-26 21:41:04,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.764e+01 9.533e+01 1.022e+02 1.336e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 21:41:16,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3583813.3333333335, ans=0.0 2023-11-26 21:41:23,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3583880.0, ans=0.1 2023-11-26 21:41:28,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3583880.0, ans=0.0 2023-11-26 21:41:40,585 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-26 21:41:43,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3583946.6666666665, ans=0.2 2023-11-26 21:41:45,588 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8550, loss[loss=0.06453, simple_loss=0.09849, pruned_loss=0.009839, audio_tagging_loss=0.005444, over 15568.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08962, pruned_loss=0.01204, audio_tagging_loss=0.008636, over 3057678.22 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:41:45,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3584013.3333333335, ans=0.125 2023-11-26 21:41:58,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3584080.0, ans=0.0 2023-11-26 21:42:00,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3584080.0, ans=10.0 2023-11-26 21:42:17,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3584213.3333333335, ans=0.125 2023-11-26 21:42:37,276 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-26 21:42:41,434 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8600, loss[loss=0.07724, simple_loss=0.1027, pruned_loss=0.01583, audio_tagging_loss=0.01004, over 14972.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08886, pruned_loss=0.01189, audio_tagging_loss=0.00882, over 3052044.13 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:42:53,408 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:42:55,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.837e+01 9.386e+01 1.001e+02 1.418e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:42:56,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3584413.3333333335, ans=0.0 2023-11-26 21:42:57,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3584413.3333333335, ans=0.125 2023-11-26 21:43:14,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3584546.6666666665, ans=0.2 2023-11-26 21:43:20,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3584546.6666666665, ans=0.0 2023-11-26 21:43:29,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3584613.3333333335, ans=0.125 2023-11-26 21:43:32,610 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-26 21:43:36,831 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8650, loss[loss=0.05669, simple_loss=0.07707, pruned_loss=0.008555, audio_tagging_loss=0.009598, over 14785.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08953, pruned_loss=0.0119, audio_tagging_loss=0.008819, over 3052185.89 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:43:40,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3584680.0, ans=0.125 2023-11-26 21:43:50,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3584746.6666666665, ans=0.0 2023-11-26 21:43:54,637 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:44:05,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3584813.3333333335, ans=0.125 2023-11-26 21:44:11,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3584880.0, ans=0.125 2023-11-26 21:44:14,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-11-26 21:44:20,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3584946.6666666665, ans=0.0 2023-11-26 21:44:28,159 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-26 21:44:33,343 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8700, loss[loss=0.04478, simple_loss=0.05362, pruned_loss=0.00595, audio_tagging_loss=0.01202, over 14442.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08914, pruned_loss=0.01189, audio_tagging_loss=0.008901, over 3059338.83 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:44:41,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2023-11-26 21:44:49,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.138e+01 9.810e+01 1.049e+02 1.289e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 21:45:20,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3585280.0, ans=0.1 2023-11-26 21:45:23,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3585280.0, ans=0.125 2023-11-26 21:45:24,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-26 21:45:29,262 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8750, loss[loss=0.08954, simple_loss=0.1179, pruned_loss=0.02143, audio_tagging_loss=0.00917, over 15825.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08999, pruned_loss=0.01205, audio_tagging_loss=0.008931, over 3057562.73 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:45:41,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.62 vs. limit=10.0 2023-11-26 21:46:17,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-26 21:46:20,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-26 21:46:20,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3585613.3333333335, ans=0.2 2023-11-26 21:46:24,668 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8800, loss[loss=0.06863, simple_loss=0.09474, pruned_loss=0.0127, audio_tagging_loss=0.008555, over 14798.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08978, pruned_loss=0.0121, audio_tagging_loss=0.008988, over 3056514.39 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:46:28,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3585680.0, ans=0.125 2023-11-26 21:46:32,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3585680.0, ans=0.0 2023-11-26 21:46:40,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.993e+01 9.548e+01 1.016e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 21:46:52,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3585813.3333333335, ans=0.125 2023-11-26 21:47:07,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3585880.0, ans=0.125 2023-11-26 21:47:15,759 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-26 21:47:20,534 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8850, loss[loss=0.07996, simple_loss=0.1163, pruned_loss=0.01709, audio_tagging_loss=0.004726, over 16088.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08982, pruned_loss=0.01208, audio_tagging_loss=0.008933, over 3060024.25 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:47:23,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3586013.3333333335, ans=0.5 2023-11-26 21:47:33,210 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:47:56,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3586213.3333333335, ans=0.1 2023-11-26 21:47:57,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3586213.3333333335, ans=0.07 2023-11-26 21:48:12,675 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-26 21:48:14,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3586280.0, ans=0.125 2023-11-26 21:48:16,835 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8900, loss[loss=0.06576, simple_loss=0.1034, pruned_loss=0.008316, audio_tagging_loss=0.005733, over 16520.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09037, pruned_loss=0.01224, audio_tagging_loss=0.008808, over 3057712.97 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:48:20,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-26 21:48:33,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.001e+01 9.576e+01 1.032e+02 1.288e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 21:48:34,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3586413.3333333335, ans=0.125 2023-11-26 21:48:36,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586413.3333333335, ans=0.1 2023-11-26 21:48:41,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2023-11-26 21:48:44,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3586480.0, ans=0.125 2023-11-26 21:48:51,115 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:48:58,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3586546.6666666665, ans=0.125 2023-11-26 21:48:59,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3586546.6666666665, ans=0.125 2023-11-26 21:49:07,869 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-26 21:49:08,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3586613.3333333335, ans=0.125 2023-11-26 21:49:12,801 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8950, loss[loss=0.06786, simple_loss=0.09176, pruned_loss=0.01308, audio_tagging_loss=0.008909, over 15047.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08987, pruned_loss=0.0122, audio_tagging_loss=0.008704, over 3054342.04 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:49:33,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3586746.6666666665, ans=0.2 2023-11-26 21:49:34,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3586813.3333333335, ans=0.0 2023-11-26 21:49:48,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3586880.0, ans=0.0 2023-11-26 21:49:52,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=12.0 2023-11-26 21:50:03,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-26 21:50:08,017 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9000, loss[loss=0.06945, simple_loss=0.09796, pruned_loss=0.01385, audio_tagging_loss=0.006625, over 14537.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09021, pruned_loss=0.0123, audio_tagging_loss=0.008552, over 3053638.79 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:50:08,017 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 21:50:40,486 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05836, simple_loss=0.0505, pruned_loss=0.005274, audio_tagging_loss=0.02784, over 4681554.00 frames. 2023-11-26 21:50:40,486 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 21:50:41,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3587013.3333333335, ans=0.0 2023-11-26 21:50:48,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3587013.3333333335, ans=0.125 2023-11-26 21:50:56,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3587080.0, ans=0.125 2023-11-26 21:50:56,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.868e+01 9.363e+01 9.972e+01 1.329e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:51:05,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3587146.6666666665, ans=0.125 2023-11-26 21:51:09,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2023-11-26 21:51:11,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3587146.6666666665, ans=0.5 2023-11-26 21:51:27,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3587280.0, ans=0.2 2023-11-26 21:51:31,223 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-26 21:51:35,386 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9050, loss[loss=0.05733, simple_loss=0.06684, pruned_loss=0.01403, audio_tagging_loss=0.00988, over 16812.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08968, pruned_loss=0.01218, audio_tagging_loss=0.008495, over 3059480.05 frames. ], batch size: 65, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:52:26,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-26 21:52:31,948 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9100, loss[loss=0.07479, simple_loss=0.1109, pruned_loss=0.01346, audio_tagging_loss=0.005869, over 15373.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.0894, pruned_loss=0.01201, audio_tagging_loss=0.00851, over 3055495.65 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:52:38,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3587680.0, ans=0.2 2023-11-26 21:52:49,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.688e+01 9.524e+01 1.031e+02 1.397e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 21:52:56,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.41 vs. limit=10.0 2023-11-26 21:53:06,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-11-26 21:53:19,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3587946.6666666665, ans=0.125 2023-11-26 21:53:23,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-26 21:53:28,232 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9150, loss[loss=0.06473, simple_loss=0.07937, pruned_loss=0.01589, audio_tagging_loss=0.009162, over 14665.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08951, pruned_loss=0.01205, audio_tagging_loss=0.008498, over 3053105.78 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:53:37,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3588080.0, ans=0.125 2023-11-26 21:53:42,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3588080.0, ans=0.125 2023-11-26 21:53:48,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-26 21:53:49,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3588146.6666666665, ans=0.125 2023-11-26 21:54:14,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3588280.0, ans=0.125 2023-11-26 21:54:19,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-26 21:54:23,513 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9200, loss[loss=0.0871, simple_loss=0.1168, pruned_loss=0.01989, audio_tagging_loss=0.008783, over 14818.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08979, pruned_loss=0.01214, audio_tagging_loss=0.00859, over 3052946.44 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:54:24,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3588346.6666666665, ans=0.125 2023-11-26 21:54:42,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.859e+01 9.629e+01 1.034e+02 1.503e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 21:55:10,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3588613.3333333335, ans=0.1 2023-11-26 21:55:15,024 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-26 21:55:15,227 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:55:19,667 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9250, loss[loss=0.0541, simple_loss=0.07422, pruned_loss=0.009674, audio_tagging_loss=0.007315, over 15225.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08863, pruned_loss=0.01202, audio_tagging_loss=0.008642, over 3052720.43 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:55:23,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2023-11-26 21:55:23,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-26 21:55:28,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-11-26 21:55:38,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2023-11-26 21:55:38,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-26 21:55:51,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3588880.0, ans=0.02 2023-11-26 21:56:11,859 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-26 21:56:15,965 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9300, loss[loss=0.05864, simple_loss=0.07603, pruned_loss=0.01117, audio_tagging_loss=0.009454, over 16142.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08901, pruned_loss=0.01214, audio_tagging_loss=0.008607, over 3047695.27 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:56:27,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.17 vs. limit=22.5 2023-11-26 21:56:32,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.799e+01 9.431e+01 1.003e+02 1.401e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:57:07,062 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-26 21:57:08,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3589280.0, ans=0.125 2023-11-26 21:57:09,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3589280.0, ans=0.2 2023-11-26 21:57:11,535 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9350, loss[loss=0.06662, simple_loss=0.09119, pruned_loss=0.01411, audio_tagging_loss=0.00691, over 15784.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0892, pruned_loss=0.01222, audio_tagging_loss=0.008514, over 3050450.45 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:57:15,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3589346.6666666665, ans=0.0 2023-11-26 21:57:19,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3589346.6666666665, ans=0.125 2023-11-26 21:57:41,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3589480.0, ans=0.2 2023-11-26 21:57:52,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-26 21:57:53,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3589546.6666666665, ans=0.125 2023-11-26 21:57:58,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3589613.3333333335, ans=0.125 2023-11-26 21:57:59,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3589613.3333333335, ans=0.2 2023-11-26 21:58:02,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-26 21:58:06,458 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9400, loss[loss=0.09323, simple_loss=0.1374, pruned_loss=0.01793, audio_tagging_loss=0.006596, over 15505.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08932, pruned_loss=0.01217, audio_tagging_loss=0.008661, over 3048014.50 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:58:18,018 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:58:25,069 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 9.009e+01 9.595e+01 1.056e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 21:58:27,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3589746.6666666665, ans=0.125 2023-11-26 21:58:55,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2023-11-26 21:58:58,859 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-26 21:59:03,567 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9450, loss[loss=0.05823, simple_loss=0.0784, pruned_loss=0.00952, audio_tagging_loss=0.009512, over 15772.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08977, pruned_loss=0.01225, audio_tagging_loss=0.008732, over 3043447.99 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:59:03,590 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:59:16,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3590080.0, ans=0.125 2023-11-26 21:59:16,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3590080.0, ans=0.2 2023-11-26 21:59:16,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-26 21:59:21,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3590080.0, ans=0.2 2023-11-26 21:59:24,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3590146.6666666665, ans=0.2 2023-11-26 21:59:30,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3590146.6666666665, ans=0.125 2023-11-26 21:59:32,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3590146.6666666665, ans=0.125 2023-11-26 21:59:38,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3590213.3333333335, ans=0.125 2023-11-26 21:59:41,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-26 21:59:55,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-26 21:59:59,335 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9500, loss[loss=0.07209, simple_loss=0.1089, pruned_loss=0.009498, audio_tagging_loss=0.008131, over 15272.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09003, pruned_loss=0.01228, audio_tagging_loss=0.008782, over 3044121.26 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:00:11,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3590413.3333333335, ans=0.0 2023-11-26 22:00:12,284 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:00:17,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3590413.3333333335, ans=0.125 2023-11-26 22:00:18,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.000e+01 9.693e+01 1.049e+02 2.337e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-26 22:00:25,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.66 vs. limit=10.0 2023-11-26 22:00:30,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=8.0 2023-11-26 22:00:39,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2023-11-26 22:00:46,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-26 22:00:50,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.69 vs. limit=6.0 2023-11-26 22:00:50,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-26 22:00:55,220 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9550, loss[loss=0.05521, simple_loss=0.07762, pruned_loss=0.006692, audio_tagging_loss=0.009707, over 15427.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08954, pruned_loss=0.01202, audio_tagging_loss=0.008867, over 3049982.45 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:01:06,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3590746.6666666665, ans=0.1 2023-11-26 22:01:16,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=12.0 2023-11-26 22:01:17,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3590813.3333333335, ans=0.5 2023-11-26 22:01:24,186 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:01:29,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3590880.0, ans=0.125 2023-11-26 22:01:40,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3590946.6666666665, ans=0.0 2023-11-26 22:01:47,416 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-26 22:01:50,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3590946.6666666665, ans=0.125 2023-11-26 22:01:52,744 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9600, loss[loss=0.05848, simple_loss=0.08893, pruned_loss=0.005939, audio_tagging_loss=0.008075, over 15734.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08946, pruned_loss=0.01205, audio_tagging_loss=0.008892, over 3048047.53 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:02:01,260 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:02:10,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.846e+01 9.558e+01 1.014e+02 1.385e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 22:02:15,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3591146.6666666665, ans=0.0 2023-11-26 22:02:32,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3591213.3333333335, ans=0.125 2023-11-26 22:02:39,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=22.5 2023-11-26 22:02:43,717 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-26 22:02:46,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3591280.0, ans=0.2 2023-11-26 22:02:47,908 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9650, loss[loss=0.08061, simple_loss=0.117, pruned_loss=0.01517, audio_tagging_loss=0.006932, over 15368.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08983, pruned_loss=0.01226, audio_tagging_loss=0.008787, over 3037241.68 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:02:55,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=15.0 2023-11-26 22:03:03,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3591413.3333333335, ans=0.1 2023-11-26 22:03:29,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3591546.6666666665, ans=0.0 2023-11-26 22:03:31,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3591613.3333333335, ans=0.125 2023-11-26 22:03:32,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2023-11-26 22:03:38,573 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-26 22:03:39,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3591613.3333333335, ans=0.2 2023-11-26 22:03:42,847 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9700, loss[loss=0.06963, simple_loss=0.09066, pruned_loss=0.01324, audio_tagging_loss=0.01107, over 14799.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09012, pruned_loss=0.01237, audio_tagging_loss=0.008636, over 3034198.58 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:03:44,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3591680.0, ans=0.2 2023-11-26 22:03:53,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3591746.6666666665, ans=0.1 2023-11-26 22:04:00,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3591746.6666666665, ans=0.125 2023-11-26 22:04:02,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.825e+01 9.473e+01 1.012e+02 1.378e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 22:04:15,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3591880.0, ans=0.0 2023-11-26 22:04:19,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3591880.0, ans=0.125 2023-11-26 22:04:26,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-26 22:04:29,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3591946.6666666665, ans=0.1 2023-11-26 22:04:30,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3591946.6666666665, ans=0.2 2023-11-26 22:04:34,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-26 22:04:34,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3591946.6666666665, ans=0.0 2023-11-26 22:04:39,044 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9750, loss[loss=0.09132, simple_loss=0.1255, pruned_loss=0.02195, audio_tagging_loss=0.0066, over 15248.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08955, pruned_loss=0.01225, audio_tagging_loss=0.008535, over 3034981.97 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:04:51,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3592080.0, ans=0.0 2023-11-26 22:04:56,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3592080.0, ans=0.125 2023-11-26 22:05:05,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-26 22:05:10,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=22.5 2023-11-26 22:05:14,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3592213.3333333335, ans=0.2 2023-11-26 22:05:16,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3592213.3333333335, ans=0.2 2023-11-26 22:05:19,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3592213.3333333335, ans=0.125 2023-11-26 22:05:30,186 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-26 22:05:30,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3592280.0, ans=0.0 2023-11-26 22:05:34,344 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9800, loss[loss=0.07584, simple_loss=0.1027, pruned_loss=0.01362, audio_tagging_loss=0.01088, over 14968.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08999, pruned_loss=0.01226, audio_tagging_loss=0.008462, over 3035728.35 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:05:41,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3592346.6666666665, ans=0.125 2023-11-26 22:05:43,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3592346.6666666665, ans=0.2 2023-11-26 22:05:52,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.733e+01 9.432e+01 1.005e+02 1.366e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 22:06:10,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3592546.6666666665, ans=0.0 2023-11-26 22:06:25,735 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:06:25,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-26 22:06:29,956 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9850, loss[loss=0.07105, simple_loss=0.1012, pruned_loss=0.01071, audio_tagging_loss=0.009745, over 15148.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08988, pruned_loss=0.01228, audio_tagging_loss=0.008488, over 3036715.42 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:06:41,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3592746.6666666665, ans=0.125 2023-11-26 22:06:58,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=15.0 2023-11-26 22:07:04,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3592880.0, ans=10.0 2023-11-26 22:07:21,331 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-26 22:07:26,012 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9900, loss[loss=0.07117, simple_loss=0.1028, pruned_loss=0.009581, audio_tagging_loss=0.01017, over 14964.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09098, pruned_loss=0.01238, audio_tagging_loss=0.008434, over 3046313.00 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:07:35,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3593013.3333333335, ans=0.125 2023-11-26 22:07:45,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 9.069e+01 9.666e+01 1.030e+02 3.243e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-26 22:07:51,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3593146.6666666665, ans=0.125 2023-11-26 22:07:58,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3593213.3333333335, ans=0.125 2023-11-26 22:08:16,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-26 22:08:21,875 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9950, loss[loss=0.07956, simple_loss=0.1141, pruned_loss=0.01421, audio_tagging_loss=0.008313, over 14880.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09102, pruned_loss=0.01243, audio_tagging_loss=0.008451, over 3045149.70 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:08:28,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3593346.6666666665, ans=0.125 2023-11-26 22:08:44,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593480.0, ans=0.1 2023-11-26 22:08:55,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3593546.6666666665, ans=0.0 2023-11-26 22:09:00,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3593546.6666666665, ans=0.04949747468305833 2023-11-26 22:09:07,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=12.0 2023-11-26 22:09:12,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-26 22:09:14,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3593613.3333333335, ans=0.125 2023-11-26 22:09:17,150 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10000, loss[loss=0.07623, simple_loss=0.1019, pruned_loss=0.01772, audio_tagging_loss=0.007568, over 14116.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09006, pruned_loss=0.01228, audio_tagging_loss=0.008466, over 3043837.28 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:09:18,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3593680.0, ans=0.125 2023-11-26 22:09:31,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593746.6666666665, ans=0.1 2023-11-26 22:09:35,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.750e+01 9.330e+01 1.017e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 22:09:42,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3593813.3333333335, ans=0.125 2023-11-26 22:09:56,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3593880.0, ans=0.0 2023-11-26 22:10:07,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-26 22:10:11,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3594013.3333333335, ans=0.2 2023-11-26 22:10:12,290 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10050, loss[loss=0.06522, simple_loss=0.09195, pruned_loss=0.01084, audio_tagging_loss=0.008406, over 15780.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09061, pruned_loss=0.01229, audio_tagging_loss=0.008542, over 3041192.21 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:10:23,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3594080.0, ans=0.2 2023-11-26 22:10:24,812 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:10:25,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2023-11-26 22:10:39,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3594146.6666666665, ans=0.125 2023-11-26 22:10:52,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2023-11-26 22:11:03,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-26 22:11:06,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3594346.6666666665, ans=0.0 2023-11-26 22:11:07,342 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10100, loss[loss=0.06382, simple_loss=0.07955, pruned_loss=0.01205, audio_tagging_loss=0.012, over 15501.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09057, pruned_loss=0.01231, audio_tagging_loss=0.008489, over 3040771.00 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:11:27,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 9.131e+01 9.595e+01 1.046e+02 1.257e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:11:36,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-26 22:11:39,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2023-11-26 22:11:41,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3594546.6666666665, ans=0.0 2023-11-26 22:11:46,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2023-11-26 22:11:51,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3594613.3333333335, ans=0.1 2023-11-26 22:11:52,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=22.5 2023-11-26 22:11:53,699 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:11:58,615 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-26 22:12:01,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3594613.3333333335, ans=0.0 2023-11-26 22:12:03,081 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10150, loss[loss=0.06694, simple_loss=0.09446, pruned_loss=0.0101, audio_tagging_loss=0.009608, over 16276.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09112, pruned_loss=0.01237, audio_tagging_loss=0.008584, over 3042984.53 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:12:14,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3594746.6666666665, ans=0.0 2023-11-26 22:12:23,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3594746.6666666665, ans=0.125 2023-11-26 22:12:24,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-26 22:12:30,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-26 22:12:30,754 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:12:39,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-11-26 22:12:53,681 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-26 22:12:53,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3594946.6666666665, ans=0.0 2023-11-26 22:12:58,450 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10200, loss[loss=0.0799, simple_loss=0.1064, pruned_loss=0.01544, audio_tagging_loss=0.01128, over 14043.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09064, pruned_loss=0.01231, audio_tagging_loss=0.00874, over 3044661.75 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:10,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3595080.0, ans=0.125 2023-11-26 22:13:14,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3595080.0, ans=0.125 2023-11-26 22:13:18,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.041e+01 9.563e+01 1.048e+02 1.575e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 22:13:20,763 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:13:45,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3595280.0, ans=0.0 2023-11-26 22:13:49,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-26 22:13:54,227 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10250, loss[loss=0.05783, simple_loss=0.07568, pruned_loss=0.0089, audio_tagging_loss=0.01108, over 14969.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09043, pruned_loss=0.01231, audio_tagging_loss=0.008813, over 3044005.55 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:54,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-26 22:13:57,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3595346.6666666665, ans=0.035 2023-11-26 22:13:57,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-11-26 22:13:59,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:14:00,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:14:00,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:14:13,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3595413.3333333335, ans=0.0 2023-11-26 22:14:37,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3595546.6666666665, ans=0.2 2023-11-26 22:14:41,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3595613.3333333335, ans=0.1 2023-11-26 22:14:45,470 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-26 22:14:47,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3595613.3333333335, ans=0.125 2023-11-26 22:14:49,567 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10300, loss[loss=0.05882, simple_loss=0.07454, pruned_loss=0.009691, audio_tagging_loss=0.01186, over 14780.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08877, pruned_loss=0.01214, audio_tagging_loss=0.008926, over 3044054.23 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:15:10,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 9.184e+01 9.815e+01 1.071e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-26 22:15:12,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3595813.3333333335, ans=0.125 2023-11-26 22:15:18,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-11-26 22:15:33,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3595946.6666666665, ans=0.125 2023-11-26 22:15:41,417 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-26 22:15:45,995 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10350, loss[loss=0.06226, simple_loss=0.08371, pruned_loss=0.01085, audio_tagging_loss=0.009553, over 15704.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08863, pruned_loss=0.0122, audio_tagging_loss=0.008991, over 3049459.01 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:15:51,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3596013.3333333335, ans=0.125 2023-11-26 22:15:52,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-26 22:15:57,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3596080.0, ans=0.0 2023-11-26 22:16:24,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3596213.3333333335, ans=0.1 2023-11-26 22:16:25,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-26 22:16:38,447 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-26 22:16:39,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.14 vs. limit=5.0 2023-11-26 22:16:43,140 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10400, loss[loss=0.08467, simple_loss=0.1195, pruned_loss=0.01886, audio_tagging_loss=0.006077, over 15715.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08986, pruned_loss=0.01242, audio_tagging_loss=0.009083, over 3044209.18 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:16:51,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.62 vs. limit=10.0 2023-11-26 22:17:01,654 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:17:02,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.954e+01 9.594e+01 1.032e+02 1.312e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:17:30,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3596613.3333333335, ans=0.09899494936611666 2023-11-26 22:17:34,674 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-26 22:17:34,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3596613.3333333335, ans=0.125 2023-11-26 22:17:38,822 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10450, loss[loss=0.05261, simple_loss=0.07178, pruned_loss=0.009485, audio_tagging_loss=0.007232, over 15198.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08926, pruned_loss=0.01235, audio_tagging_loss=0.00895, over 3043806.87 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:17:51,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-26 22:18:02,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3596813.3333333335, ans=0.0 2023-11-26 22:18:06,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3596813.3333333335, ans=0.025 2023-11-26 22:18:06,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3596813.3333333335, ans=0.0 2023-11-26 22:18:15,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3596880.0, ans=0.2 2023-11-26 22:18:22,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3596946.6666666665, ans=0.125 2023-11-26 22:18:29,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-26 22:18:34,745 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10500, loss[loss=0.08558, simple_loss=0.1192, pruned_loss=0.01844, audio_tagging_loss=0.00756, over 15855.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08919, pruned_loss=0.01221, audio_tagging_loss=0.008811, over 3048655.77 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:18:40,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=22.5 2023-11-26 22:18:55,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.749e+01 9.296e+01 1.026e+02 1.262e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 22:18:56,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-26 22:19:00,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3597146.6666666665, ans=0.0 2023-11-26 22:19:18,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3597280.0, ans=0.125 2023-11-26 22:19:26,535 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-26 22:19:30,991 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10550, loss[loss=0.05222, simple_loss=0.06754, pruned_loss=0.008499, audio_tagging_loss=0.009951, over 15438.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08937, pruned_loss=0.01219, audio_tagging_loss=0.008664, over 3044020.40 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:19:32,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3597346.6666666665, ans=0.025 2023-11-26 22:19:40,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3597346.6666666665, ans=0.025 2023-11-26 22:19:40,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3597346.6666666665, ans=0.2 2023-11-26 22:19:43,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3597413.3333333335, ans=0.0 2023-11-26 22:19:44,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2023-11-26 22:19:54,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3597480.0, ans=0.025 2023-11-26 22:20:02,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3597480.0, ans=0.0 2023-11-26 22:20:09,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3597546.6666666665, ans=0.0 2023-11-26 22:20:22,553 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-26 22:20:26,731 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10600, loss[loss=0.09519, simple_loss=0.1314, pruned_loss=0.02403, audio_tagging_loss=0.005456, over 14741.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08923, pruned_loss=0.01221, audio_tagging_loss=0.008627, over 3040483.73 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:20:44,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-11-26 22:20:47,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.604e+01 9.125e+01 9.885e+01 1.207e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 22:20:52,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3597813.3333333335, ans=0.125 2023-11-26 22:21:09,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3597880.0, ans=0.125 2023-11-26 22:21:17,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-26 22:21:22,188 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10650, loss[loss=0.06007, simple_loss=0.07517, pruned_loss=0.01314, audio_tagging_loss=0.009343, over 14450.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08987, pruned_loss=0.01242, audio_tagging_loss=0.008557, over 3039523.80 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:21:24,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3598013.3333333335, ans=0.2 2023-11-26 22:21:25,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3598013.3333333335, ans=0.125 2023-11-26 22:21:53,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3598146.6666666665, ans=0.0 2023-11-26 22:22:03,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3598213.3333333335, ans=0.125 2023-11-26 22:22:14,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-26 22:22:17,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-11-26 22:22:18,309 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10700, loss[loss=0.0908, simple_loss=0.1343, pruned_loss=0.01754, audio_tagging_loss=0.006136, over 15639.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09039, pruned_loss=0.01243, audio_tagging_loss=0.008519, over 3043910.69 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:22:22,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3598346.6666666665, ans=0.2 2023-11-26 22:22:34,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2023-11-26 22:22:37,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.850e+01 9.452e+01 1.010e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 22:23:09,838 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-26 22:23:11,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3598613.3333333335, ans=0.1 2023-11-26 22:23:14,297 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10750, loss[loss=0.08281, simple_loss=0.1141, pruned_loss=0.01941, audio_tagging_loss=0.006336, over 16500.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09034, pruned_loss=0.01234, audio_tagging_loss=0.008566, over 3049135.51 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:23:50,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3598880.0, ans=0.2 2023-11-26 22:23:59,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3598946.6666666665, ans=0.125 2023-11-26 22:24:02,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3598946.6666666665, ans=0.0 2023-11-26 22:24:05,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-26 22:24:09,430 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10800, loss[loss=0.07899, simple_loss=0.1023, pruned_loss=0.01976, audio_tagging_loss=0.008077, over 15415.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.0895, pruned_loss=0.01211, audio_tagging_loss=0.008581, over 3049934.74 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:24:13,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3599013.3333333335, ans=0.0 2023-11-26 22:24:31,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.827e+01 9.312e+01 1.017e+02 1.289e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 22:24:42,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3599213.3333333335, ans=0.2 2023-11-26 22:24:58,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3599280.0, ans=0.025 2023-11-26 22:24:59,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3599280.0, ans=0.1 2023-11-26 22:25:01,009 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-26 22:25:05,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-26 22:25:06,354 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10850, loss[loss=0.065, simple_loss=0.09359, pruned_loss=0.01047, audio_tagging_loss=0.007742, over 15646.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09081, pruned_loss=0.01235, audio_tagging_loss=0.008548, over 3051810.20 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:25:09,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3599346.6666666665, ans=0.2 2023-11-26 22:25:16,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3599413.3333333335, ans=0.09899494936611666 2023-11-26 22:25:26,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3599413.3333333335, ans=0.2 2023-11-26 22:25:37,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3599546.6666666665, ans=0.125 2023-11-26 22:25:57,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-26 22:26:00,071 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:26:01,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3599680.0, ans=0.0 2023-11-26 22:26:02,143 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10900, loss[loss=0.05759, simple_loss=0.07691, pruned_loss=0.008371, audio_tagging_loss=0.01076, over 15909.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09053, pruned_loss=0.01231, audio_tagging_loss=0.008577, over 3058975.73 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:26:05,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-26 22:26:23,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 9.085e+01 9.626e+01 1.024e+02 1.281e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 22:26:29,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-26 22:26:31,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3599813.3333333335, ans=0.1 2023-11-26 22:26:32,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-26 22:26:33,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3599813.3333333335, ans=0.1 2023-11-26 22:26:52,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3599946.6666666665, ans=0.2 2023-11-26 22:26:53,217 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-26 22:26:59,553 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10950, loss[loss=0.05881, simple_loss=0.07574, pruned_loss=0.00987, audio_tagging_loss=0.01106, over 16111.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08996, pruned_loss=0.01215, audio_tagging_loss=0.008599, over 3050353.37 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:27:00,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3600013.3333333335, ans=0.0 2023-11-26 22:27:08,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3600013.3333333335, ans=0.125 2023-11-26 22:27:20,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3600080.0, ans=0.05 2023-11-26 22:27:21,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3600146.6666666665, ans=0.125 2023-11-26 22:27:23,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3600146.6666666665, ans=0.125 2023-11-26 22:27:36,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3600213.3333333335, ans=0.125 2023-11-26 22:27:45,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3600280.0, ans=0.125 2023-11-26 22:27:50,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-26 22:27:55,675 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11000, loss[loss=0.06048, simple_loss=0.0847, pruned_loss=0.008227, audio_tagging_loss=0.009899, over 14421.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08973, pruned_loss=0.01214, audio_tagging_loss=0.008656, over 3039984.55 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:27:56,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3600346.6666666665, ans=0.0 2023-11-26 22:28:07,283 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:28:07,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3600413.3333333335, ans=0.05 2023-11-26 22:28:17,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.990e+01 9.480e+01 9.957e+01 3.729e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 22:28:19,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3600480.0, ans=0.0 2023-11-26 22:28:25,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3600480.0, ans=0.2 2023-11-26 22:28:25,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3600480.0, ans=0.2 2023-11-26 22:28:33,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-26 22:28:36,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3600546.6666666665, ans=0.125 2023-11-26 22:28:38,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3600546.6666666665, ans=0.0 2023-11-26 22:28:47,282 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-26 22:28:52,036 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11050, loss[loss=0.0767, simple_loss=0.0958, pruned_loss=0.01936, audio_tagging_loss=0.009441, over 14718.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08977, pruned_loss=0.01228, audio_tagging_loss=0.008793, over 3033483.43 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:29:29,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3600880.0, ans=0.2 2023-11-26 22:29:37,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600946.6666666665, ans=0.1 2023-11-26 22:29:41,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600946.6666666665, ans=0.1 2023-11-26 22:29:42,430 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-26 22:29:46,546 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11100, loss[loss=0.05765, simple_loss=0.0734, pruned_loss=0.01295, audio_tagging_loss=0.008, over 15137.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08944, pruned_loss=0.01224, audio_tagging_loss=0.008882, over 3041668.16 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:30:05,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-26 22:30:08,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.991e+01 9.032e+01 9.689e+01 1.034e+02 1.564e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 22:30:19,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3601213.3333333335, ans=0.125 2023-11-26 22:30:20,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3601213.3333333335, ans=0.07 2023-11-26 22:30:23,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-26 22:30:33,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-11-26 22:30:37,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-26 22:30:41,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3601346.6666666665, ans=0.0 2023-11-26 22:30:42,448 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11150, loss[loss=0.06557, simple_loss=0.08366, pruned_loss=0.01265, audio_tagging_loss=0.01109, over 13425.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08996, pruned_loss=0.01245, audio_tagging_loss=0.008916, over 3044993.51 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:30:55,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3601413.3333333335, ans=0.04949747468305833 2023-11-26 22:31:17,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3601546.6666666665, ans=0.0 2023-11-26 22:31:18,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3601546.6666666665, ans=0.1 2023-11-26 22:31:28,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-11-26 22:31:33,875 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-26 22:31:38,611 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11200, loss[loss=0.05503, simple_loss=0.07079, pruned_loss=0.006049, audio_tagging_loss=0.01359, over 16490.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08975, pruned_loss=0.01232, audio_tagging_loss=0.008945, over 3040030.13 frames. ], batch size: 66, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:31:54,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3601746.6666666665, ans=0.125 2023-11-26 22:32:01,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.768e+01 9.515e+01 1.029e+02 1.320e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:32:01,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3601813.3333333335, ans=0.2 2023-11-26 22:32:12,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3601880.0, ans=0.04949747468305833 2023-11-26 22:32:17,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2023-11-26 22:32:21,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3601880.0, ans=0.125 2023-11-26 22:32:25,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3601946.6666666665, ans=0.2 2023-11-26 22:32:30,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-26 22:32:34,396 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11250, loss[loss=0.06759, simple_loss=0.0962, pruned_loss=0.01085, audio_tagging_loss=0.008638, over 14936.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0893, pruned_loss=0.0121, audio_tagging_loss=0.00895, over 3040016.93 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:32:43,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3602013.3333333335, ans=0.125 2023-11-26 22:32:44,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3602080.0, ans=0.0 2023-11-26 22:33:05,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3602146.6666666665, ans=0.125 2023-11-26 22:33:25,537 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-26 22:33:26,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3602280.0, ans=0.0 2023-11-26 22:33:29,735 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11300, loss[loss=0.05727, simple_loss=0.07579, pruned_loss=0.007507, audio_tagging_loss=0.01187, over 15912.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08913, pruned_loss=0.01225, audio_tagging_loss=0.008866, over 3047163.74 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:33:29,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3602346.6666666665, ans=0.125 2023-11-26 22:33:35,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3602346.6666666665, ans=0.125 2023-11-26 22:33:48,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3602413.3333333335, ans=0.1 2023-11-26 22:33:50,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-26 22:33:53,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3602480.0, ans=0.2 2023-11-26 22:33:53,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3602480.0, ans=0.0 2023-11-26 22:33:54,061 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.683e+01 9.336e+01 1.007e+02 1.340e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 22:33:58,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2023-11-26 22:34:21,810 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-26 22:34:26,372 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11350, loss[loss=0.04712, simple_loss=0.06368, pruned_loss=0.006863, audio_tagging_loss=0.008419, over 14827.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08914, pruned_loss=0.01218, audio_tagging_loss=0.008776, over 3038089.08 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:34:26,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3602680.0, ans=0.125 2023-11-26 22:34:29,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3602680.0, ans=0.125 2023-11-26 22:34:41,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-26 22:34:42,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3602746.6666666665, ans=0.0 2023-11-26 22:34:51,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3602813.3333333335, ans=0.2 2023-11-26 22:34:54,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3602813.3333333335, ans=0.125 2023-11-26 22:35:02,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-26 22:35:04,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3602880.0, ans=0.07 2023-11-26 22:35:08,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-26 22:35:17,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-26 22:35:22,601 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11400, loss[loss=0.05776, simple_loss=0.08249, pruned_loss=0.007559, audio_tagging_loss=0.008958, over 14955.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08855, pruned_loss=0.01199, audio_tagging_loss=0.008651, over 3033948.21 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:35:33,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3603080.0, ans=0.5 2023-11-26 22:35:46,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.997e+01 9.516e+01 1.035e+02 1.684e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:35:46,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3603146.6666666665, ans=0.035 2023-11-26 22:35:52,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3603146.6666666665, ans=0.05 2023-11-26 22:35:55,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3603213.3333333335, ans=0.125 2023-11-26 22:35:55,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3603213.3333333335, ans=0.04949747468305833 2023-11-26 22:35:56,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3603213.3333333335, ans=0.1 2023-11-26 22:36:13,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-26 22:36:17,738 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11450, loss[loss=0.07712, simple_loss=0.1124, pruned_loss=0.01457, audio_tagging_loss=0.006346, over 14488.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08902, pruned_loss=0.01205, audio_tagging_loss=0.008596, over 3038432.05 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:36:34,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3603413.3333333335, ans=0.0 2023-11-26 22:36:38,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3603413.3333333335, ans=0.2 2023-11-26 22:37:04,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3603613.3333333335, ans=0.1 2023-11-26 22:37:09,928 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-26 22:37:14,191 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11500, loss[loss=0.04224, simple_loss=0.04823, pruned_loss=0.008229, audio_tagging_loss=0.009896, over 14994.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.088, pruned_loss=0.01188, audio_tagging_loss=0.008645, over 3039767.22 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:37:17,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3603680.0, ans=0.0 2023-11-26 22:37:26,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3603746.6666666665, ans=0.07 2023-11-26 22:37:37,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.979e+01 9.575e+01 1.038e+02 1.869e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 22:37:38,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-26 22:37:43,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3603813.3333333335, ans=0.125 2023-11-26 22:38:04,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3603946.6666666665, ans=0.125 2023-11-26 22:38:05,440 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-26 22:38:05,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2023-11-26 22:38:09,902 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11550, loss[loss=0.07182, simple_loss=0.1032, pruned_loss=0.0146, audio_tagging_loss=0.005601, over 15293.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08791, pruned_loss=0.01186, audio_tagging_loss=0.008663, over 3037750.90 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:38:38,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3604146.6666666665, ans=0.125 2023-11-26 22:38:46,081 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:39:01,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-26 22:39:05,795 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11600, loss[loss=0.07179, simple_loss=0.09802, pruned_loss=0.01326, audio_tagging_loss=0.009515, over 15019.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08851, pruned_loss=0.01205, audio_tagging_loss=0.008606, over 3042761.19 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:39:22,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-11-26 22:39:30,778 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.802e+01 1.035e+02 1.553e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 22:39:37,381 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:39:57,458 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-26 22:39:57,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3604613.3333333335, ans=0.125 2023-11-26 22:40:02,174 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11650, loss[loss=0.05107, simple_loss=0.06526, pruned_loss=0.007403, audio_tagging_loss=0.01103, over 16369.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0901, pruned_loss=0.01241, audio_tagging_loss=0.008531, over 3038766.62 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:40:33,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3604813.3333333335, ans=0.05 2023-11-26 22:40:53,885 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-26 22:40:58,082 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11700, loss[loss=0.04626, simple_loss=0.06201, pruned_loss=0.005264, audio_tagging_loss=0.009986, over 15491.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08959, pruned_loss=0.01212, audio_tagging_loss=0.008575, over 3043502.97 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:40:58,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=22.5 2023-11-26 22:41:11,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3605080.0, ans=0.125 2023-11-26 22:41:14,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3605080.0, ans=0.125 2023-11-26 22:41:23,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.888e+01 9.584e+01 1.031e+02 1.555e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 22:41:49,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-26 22:41:54,393 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11750, loss[loss=0.0831, simple_loss=0.12, pruned_loss=0.01698, audio_tagging_loss=0.006119, over 15703.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08908, pruned_loss=0.01197, audio_tagging_loss=0.008608, over 3037766.76 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:41:54,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-26 22:41:57,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.38 vs. limit=10.0 2023-11-26 22:42:05,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-26 22:42:20,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3605480.0, ans=0.125 2023-11-26 22:42:25,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3605480.0, ans=0.2 2023-11-26 22:42:28,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3605546.6666666665, ans=0.125 2023-11-26 22:42:45,766 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-26 22:42:49,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3605680.0, ans=0.125 2023-11-26 22:42:50,425 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11800, loss[loss=0.06759, simple_loss=0.09068, pruned_loss=0.01337, audio_tagging_loss=0.008881, over 16161.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08863, pruned_loss=0.01203, audio_tagging_loss=0.008737, over 3044780.95 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:42:55,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3605680.0, ans=0.125 2023-11-26 22:43:00,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3605746.6666666665, ans=0.1 2023-11-26 22:43:00,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3605746.6666666665, ans=0.125 2023-11-26 22:43:08,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3605746.6666666665, ans=0.125 2023-11-26 22:43:14,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.659e+01 9.498e+01 1.042e+02 1.310e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 22:43:37,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3605946.6666666665, ans=0.1 2023-11-26 22:43:42,132 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-26 22:43:46,314 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11850, loss[loss=0.06274, simple_loss=0.08894, pruned_loss=0.009919, audio_tagging_loss=0.00835, over 16525.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08844, pruned_loss=0.01197, audio_tagging_loss=0.00881, over 3039840.18 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:44:18,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606146.6666666665, ans=0.1 2023-11-26 22:44:19,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3606213.3333333335, ans=0.05 2023-11-26 22:44:28,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3606213.3333333335, ans=0.0 2023-11-26 22:44:31,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-26 22:44:34,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3606280.0, ans=0.2 2023-11-26 22:44:37,454 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-26 22:44:38,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-26 22:44:40,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3606346.6666666665, ans=0.2 2023-11-26 22:44:41,651 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11900, loss[loss=0.06844, simple_loss=0.09982, pruned_loss=0.01179, audio_tagging_loss=0.006732, over 15187.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08844, pruned_loss=0.01205, audio_tagging_loss=0.008848, over 3042332.48 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:44:43,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.49 vs. limit=15.0 2023-11-26 22:44:45,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3606346.6666666665, ans=0.125 2023-11-26 22:45:00,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606413.3333333335, ans=0.1 2023-11-26 22:45:07,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.979e+01 9.678e+01 1.018e+02 1.926e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-26 22:45:13,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3606480.0, ans=0.025 2023-11-26 22:45:24,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-26 22:45:30,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3606613.3333333335, ans=0.0 2023-11-26 22:45:33,274 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-26 22:45:34,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3606613.3333333335, ans=0.1 2023-11-26 22:45:37,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2023-11-26 22:45:38,257 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11950, loss[loss=0.07685, simple_loss=0.101, pruned_loss=0.01725, audio_tagging_loss=0.00912, over 15007.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08859, pruned_loss=0.01216, audio_tagging_loss=0.008955, over 3036789.83 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:45:38,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3606680.0, ans=0.125 2023-11-26 22:45:41,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=15.0 2023-11-26 22:45:44,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3606680.0, ans=0.125 2023-11-26 22:45:45,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3606680.0, ans=0.2 2023-11-26 22:45:46,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3606680.0, ans=0.2 2023-11-26 22:45:46,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3606680.0, ans=0.2 2023-11-26 22:45:59,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2023-11-26 22:46:06,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-26 22:46:14,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3606880.0, ans=0.0 2023-11-26 22:46:22,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3606946.6666666665, ans=0.0 2023-11-26 22:46:27,966 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-26 22:46:32,000 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 12000, loss[loss=0.06851, simple_loss=0.09103, pruned_loss=0.01266, audio_tagging_loss=0.01033, over 15492.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08911, pruned_loss=0.01229, audio_tagging_loss=0.009031, over 3038107.51 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:46:32,001 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 22:47:04,435 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05747, simple_loss=0.05048, pruned_loss=0.005268, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-26 22:47:04,436 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 22:47:10,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3607013.3333333335, ans=0.125 2023-11-26 22:47:12,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3607013.3333333335, ans=0.015 2023-11-26 22:47:13,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3607080.0, ans=0.0 2023-11-26 22:47:25,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-11-26 22:47:26,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.102e+01 9.829e+01 1.057e+02 1.323e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 22:47:57,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-26 22:47:58,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-26 22:48:02,338 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 0, loss[loss=0.07397, simple_loss=0.08846, pruned_loss=0.01257, audio_tagging_loss=0.01717, over 16361.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.08846, pruned_loss=0.01257, audio_tagging_loss=0.01717, over 16361.00 frames. ], batch size: 62, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:48:02,339 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 22:48:12,970 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6644, 5.0844, 5.5016, 4.8109], device='cuda:1') 2023-11-26 22:48:14,458 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0608, 3.7401, 3.5743, 3.5003, 4.1595, 4.1709, 4.2392, 4.2337], device='cuda:1') 2023-11-26 22:48:33,847 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05779, simple_loss=0.05056, pruned_loss=0.005325, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-26 22:48:33,847 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 22:48:34,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-26 22:48:36,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3607186.6666666665, ans=0.2 2023-11-26 22:48:55,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-26 22:48:55,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3607320.0, ans=0.125 2023-11-26 22:49:03,293 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:49:28,971 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 50, loss[loss=0.08776, simple_loss=0.1077, pruned_loss=0.01662, audio_tagging_loss=0.01728, over 15795.00 frames. ], tot_loss[loss=0.07432, simple_loss=0.09224, pruned_loss=0.01199, audio_tagging_loss=0.01621, over 687199.12 frames. ], batch size: 58, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:49:49,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-11-26 22:49:50,741 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-26 22:50:01,564 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:50:03,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3607720.0, ans=0.09899494936611666 2023-11-26 22:50:04,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-11-26 22:50:05,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3607720.0, ans=0.125 2023-11-26 22:50:15,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2023-11-26 22:50:20,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.136e+01 9.821e+01 1.049e+02 1.148e+02 1.594e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-26 22:50:21,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3607786.6666666665, ans=0.2 2023-11-26 22:50:24,342 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 100, loss[loss=0.05941, simple_loss=0.06033, pruned_loss=0.009172, audio_tagging_loss=0.02007, over 15869.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09045, pruned_loss=0.0119, audio_tagging_loss=0.01578, over 1208342.45 frames. ], batch size: 61, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:50:47,217 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-26 22:50:49,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3607986.6666666665, ans=0.125 2023-11-26 22:51:10,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-26 22:51:15,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3608120.0, ans=0.125 2023-11-26 22:51:20,228 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 150, loss[loss=0.06987, simple_loss=0.09178, pruned_loss=0.01302, audio_tagging_loss=0.01095, over 16478.00 frames. ], tot_loss[loss=0.07122, simple_loss=0.09039, pruned_loss=0.012, audio_tagging_loss=0.01403, over 1610944.10 frames. ], batch size: 62, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:51:26,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3608186.6666666665, ans=0.0 2023-11-26 22:51:36,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=12.0 2023-11-26 22:51:43,308 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-26 22:51:54,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3608386.6666666665, ans=0.0 2023-11-26 22:51:55,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-26 22:52:01,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608386.6666666665, ans=0.1 2023-11-26 22:52:04,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-26 22:52:04,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-26 22:52:12,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.648e+01 9.369e+01 9.812e+01 1.037e+02 1.267e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 22:52:13,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-26 22:52:16,849 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 200, loss[loss=0.05724, simple_loss=0.07815, pruned_loss=0.008974, audio_tagging_loss=0.009189, over 14924.00 frames. ], tot_loss[loss=0.07024, simple_loss=0.09145, pruned_loss=0.01222, audio_tagging_loss=0.01229, over 1932830.51 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:52:18,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3608520.0, ans=0.125 2023-11-26 22:52:30,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3608586.6666666665, ans=0.125 2023-11-26 22:52:36,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3608586.6666666665, ans=0.125 2023-11-26 22:52:38,603 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-26 22:52:50,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608720.0, ans=0.1 2023-11-26 22:52:52,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3608720.0, ans=0.2 2023-11-26 22:52:54,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3608720.0, ans=0.07 2023-11-26 22:52:57,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608720.0, ans=0.1 2023-11-26 22:52:59,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3608720.0, ans=0.0 2023-11-26 22:53:08,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3608786.6666666665, ans=0.2 2023-11-26 22:53:10,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3608786.6666666665, ans=0.0 2023-11-26 22:53:12,694 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 250, loss[loss=0.08773, simple_loss=0.1263, pruned_loss=0.01921, audio_tagging_loss=0.005373, over 16059.00 frames. ], tot_loss[loss=0.06938, simple_loss=0.09179, pruned_loss=0.01236, audio_tagging_loss=0.01113, over 2181998.97 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:53:24,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608920.0, ans=0.1 2023-11-26 22:53:35,471 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-26 22:53:43,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3608986.6666666665, ans=0.2 2023-11-26 22:53:50,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3609053.3333333335, ans=0.2 2023-11-26 22:53:56,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3609053.3333333335, ans=0.125 2023-11-26 22:54:01,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3609120.0, ans=0.125 2023-11-26 22:54:05,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.060e+01 9.556e+01 1.038e+02 1.375e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 22:54:09,073 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 300, loss[loss=0.05553, simple_loss=0.06158, pruned_loss=0.01225, audio_tagging_loss=0.01248, over 14578.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09106, pruned_loss=0.01225, audio_tagging_loss=0.01043, over 2371334.00 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:54:17,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3609186.6666666665, ans=0.125 2023-11-26 22:54:32,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-26 22:54:36,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3609320.0, ans=0.125 2023-11-26 22:54:41,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-11-26 22:54:45,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3609386.6666666665, ans=0.0 2023-11-26 22:54:47,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-26 22:54:47,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2023-11-26 22:54:50,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-26 22:55:04,955 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 350, loss[loss=0.09248, simple_loss=0.1275, pruned_loss=0.02091, audio_tagging_loss=0.007813, over 15506.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09099, pruned_loss=0.01229, audio_tagging_loss=0.009837, over 2523734.64 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:55:13,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3609520.0, ans=0.0 2023-11-26 22:55:17,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3609586.6666666665, ans=0.125 2023-11-26 22:55:17,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3609586.6666666665, ans=0.1 2023-11-26 22:55:27,478 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-26 22:55:30,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3609653.3333333335, ans=0.0 2023-11-26 22:55:45,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3609720.0, ans=0.125 2023-11-26 22:55:57,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.928e+01 9.545e+01 1.017e+02 1.635e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 22:56:01,257 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 400, loss[loss=0.06044, simple_loss=0.07483, pruned_loss=0.01365, audio_tagging_loss=0.009376, over 14938.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.0901, pruned_loss=0.01222, audio_tagging_loss=0.009556, over 2639998.91 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:56:07,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3609853.3333333335, ans=0.0 2023-11-26 22:56:12,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3609920.0, ans=0.2 2023-11-26 22:56:19,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3609920.0, ans=0.0 2023-11-26 22:56:19,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3609920.0, ans=0.125 2023-11-26 22:56:23,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-26 22:56:42,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3610053.3333333335, ans=0.0 2023-11-26 22:56:56,550 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 450, loss[loss=0.05668, simple_loss=0.08399, pruned_loss=0.005976, audio_tagging_loss=0.008707, over 14926.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09063, pruned_loss=0.01209, audio_tagging_loss=0.00923, over 2730857.56 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:57:00,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2023-11-26 22:57:05,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3610186.6666666665, ans=0.125 2023-11-26 22:57:20,025 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-26 22:57:49,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 9.008e+01 9.621e+01 1.046e+02 1.513e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 22:57:53,050 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 500, loss[loss=0.05299, simple_loss=0.06631, pruned_loss=0.009838, audio_tagging_loss=0.01, over 13508.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09013, pruned_loss=0.01208, audio_tagging_loss=0.009049, over 2796534.59 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:58:01,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3610520.0, ans=0.125 2023-11-26 22:58:15,242 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-26 22:58:21,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3610653.3333333335, ans=0.0 2023-11-26 22:58:46,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3610786.6666666665, ans=0.1 2023-11-26 22:58:49,529 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 550, loss[loss=0.06749, simple_loss=0.09458, pruned_loss=0.01113, audio_tagging_loss=0.009071, over 15054.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08972, pruned_loss=0.01199, audio_tagging_loss=0.008998, over 2849782.71 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:58:51,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3610853.3333333335, ans=0.1 2023-11-26 22:59:02,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3610920.0, ans=0.0 2023-11-26 22:59:11,757 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-26 22:59:19,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3610986.6666666665, ans=0.0 2023-11-26 22:59:28,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.71 vs. limit=10.0 2023-11-26 22:59:38,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3611120.0, ans=0.0 2023-11-26 22:59:42,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.846e+01 9.307e+01 1.018e+02 1.266e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 22:59:44,776 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 600, loss[loss=0.0522, simple_loss=0.05955, pruned_loss=0.008079, audio_tagging_loss=0.01435, over 13569.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08953, pruned_loss=0.01196, audio_tagging_loss=0.008899, over 2891778.66 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:00:07,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-26 23:00:12,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2023-11-26 23:00:22,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3611386.6666666665, ans=0.0 2023-11-26 23:00:41,241 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 650, loss[loss=0.07083, simple_loss=0.1045, pruned_loss=0.009878, audio_tagging_loss=0.008694, over 15556.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0902, pruned_loss=0.01198, audio_tagging_loss=0.008722, over 2920266.15 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:00:47,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3611520.0, ans=0.2 2023-11-26 23:00:51,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3611586.6666666665, ans=0.125 2023-11-26 23:01:03,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-26 23:01:13,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3611653.3333333335, ans=0.125 2023-11-26 23:01:17,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3611720.0, ans=0.2 2023-11-26 23:01:31,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3611786.6666666665, ans=0.125 2023-11-26 23:01:35,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.767e+01 9.555e+01 1.054e+02 1.204e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:01:35,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3611786.6666666665, ans=0.125 2023-11-26 23:01:37,483 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 700, loss[loss=0.08446, simple_loss=0.1181, pruned_loss=0.01642, audio_tagging_loss=0.008977, over 15053.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09018, pruned_loss=0.01201, audio_tagging_loss=0.008774, over 2957158.32 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:01:43,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-26 23:01:55,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3611920.0, ans=0.125 2023-11-26 23:01:59,846 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-26 23:02:07,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3611986.6666666665, ans=0.125 2023-11-26 23:02:13,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-11-26 23:02:33,559 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 750, loss[loss=0.06204, simple_loss=0.07917, pruned_loss=0.01271, audio_tagging_loss=0.009739, over 14911.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09042, pruned_loss=0.01205, audio_tagging_loss=0.008833, over 2977087.66 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:02:33,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612186.6666666665, ans=0.1 2023-11-26 23:02:45,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-26 23:02:49,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3612253.3333333335, ans=0.2 2023-11-26 23:02:55,927 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-26 23:02:57,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612320.0, ans=0.1 2023-11-26 23:03:00,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3612320.0, ans=0.2 2023-11-26 23:03:11,048 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:03:13,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612386.6666666665, ans=0.1 2023-11-26 23:03:16,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3612386.6666666665, ans=0.0 2023-11-26 23:03:22,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3612453.3333333335, ans=0.125 2023-11-26 23:03:27,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.015e+01 9.591e+01 1.028e+02 1.389e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:03:29,965 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 800, loss[loss=0.06826, simple_loss=0.08283, pruned_loss=0.01271, audio_tagging_loss=0.01413, over 14367.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09083, pruned_loss=0.01217, audio_tagging_loss=0.008876, over 2998008.30 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:03:31,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3612520.0, ans=0.04949747468305833 2023-11-26 23:03:32,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3612520.0, ans=0.2 2023-11-26 23:03:52,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-26 23:03:57,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3612653.3333333335, ans=0.125 2023-11-26 23:03:58,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3612653.3333333335, ans=0.125 2023-11-26 23:04:02,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3612720.0, ans=0.0 2023-11-26 23:04:05,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3612720.0, ans=0.125 2023-11-26 23:04:09,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612720.0, ans=0.1 2023-11-26 23:04:13,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3612786.6666666665, ans=0.0 2023-11-26 23:04:20,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-26 23:04:25,722 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 850, loss[loss=0.05849, simple_loss=0.08056, pruned_loss=0.008939, audio_tagging_loss=0.009274, over 14405.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09028, pruned_loss=0.01215, audio_tagging_loss=0.008918, over 3006919.27 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:04:30,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3612853.3333333335, ans=0.125 2023-11-26 23:04:48,337 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-26 23:05:02,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-11-26 23:05:06,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3613053.3333333335, ans=0.125 2023-11-26 23:05:19,276 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.857e+01 9.463e+01 1.019e+02 1.516e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 23:05:21,981 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 900, loss[loss=0.06492, simple_loss=0.09403, pruned_loss=0.01224, audio_tagging_loss=0.005662, over 14806.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09077, pruned_loss=0.01203, audio_tagging_loss=0.008923, over 3017224.88 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:05:23,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3613186.6666666665, ans=0.04949747468305833 2023-11-26 23:05:31,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3613186.6666666665, ans=0.1 2023-11-26 23:05:32,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3613253.3333333335, ans=0.0 2023-11-26 23:05:40,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3613253.3333333335, ans=15.0 2023-11-26 23:05:44,497 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-26 23:06:07,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3613453.3333333335, ans=0.125 2023-11-26 23:06:14,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3613453.3333333335, ans=0.0 2023-11-26 23:06:18,754 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 950, loss[loss=0.05648, simple_loss=0.07536, pruned_loss=0.009681, audio_tagging_loss=0.009117, over 14163.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08996, pruned_loss=0.01207, audio_tagging_loss=0.008865, over 3026093.68 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:06:40,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-26 23:06:43,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2023-11-26 23:07:02,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3613786.6666666665, ans=0.2 2023-11-26 23:07:03,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-26 23:07:10,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3613786.6666666665, ans=0.05 2023-11-26 23:07:11,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3613786.6666666665, ans=0.0 2023-11-26 23:07:13,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.708e+01 9.328e+01 1.031e+02 1.282e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 23:07:14,460 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1000, loss[loss=0.08448, simple_loss=0.1203, pruned_loss=0.01591, audio_tagging_loss=0.008439, over 15215.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08957, pruned_loss=0.01201, audio_tagging_loss=0.008798, over 3028250.88 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:07:14,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-26 23:07:23,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3613853.3333333335, ans=0.0 2023-11-26 23:07:23,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-26 23:07:23,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-26 23:07:37,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-26 23:07:38,443 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:07:44,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2023-11-26 23:07:46,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3613986.6666666665, ans=0.0 2023-11-26 23:08:10,457 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1050, loss[loss=0.05435, simple_loss=0.07293, pruned_loss=0.00913, audio_tagging_loss=0.008755, over 16422.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08945, pruned_loss=0.01211, audio_tagging_loss=0.008689, over 3030230.57 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:08:27,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3614253.3333333335, ans=0.5 2023-11-26 23:08:28,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3614253.3333333335, ans=0.125 2023-11-26 23:08:33,555 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-26 23:09:05,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.633e+01 9.310e+01 1.034e+02 1.583e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 23:09:05,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3614520.0, ans=0.125 2023-11-26 23:09:06,530 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1100, loss[loss=0.05212, simple_loss=0.06579, pruned_loss=0.008687, audio_tagging_loss=0.01054, over 14756.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08969, pruned_loss=0.01213, audio_tagging_loss=0.008601, over 3036437.84 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:09:09,276 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:09:25,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2023-11-26 23:09:25,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3614586.6666666665, ans=0.125 2023-11-26 23:09:29,054 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-26 23:09:30,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3614653.3333333335, ans=0.0 2023-11-26 23:09:41,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3614720.0, ans=0.125 2023-11-26 23:09:47,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3614720.0, ans=0.125 2023-11-26 23:09:58,863 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:10:02,937 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1150, loss[loss=0.06313, simple_loss=0.0839, pruned_loss=0.01126, audio_tagging_loss=0.009917, over 14655.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08892, pruned_loss=0.01197, audio_tagging_loss=0.008539, over 3040036.80 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:10:03,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-26 23:10:05,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-26 23:10:10,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-26 23:10:25,033 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-26 23:10:34,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3614986.6666666665, ans=0.1 2023-11-26 23:10:42,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3615053.3333333335, ans=0.0 2023-11-26 23:10:42,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3615053.3333333335, ans=0.125 2023-11-26 23:10:52,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3615120.0, ans=0.0 2023-11-26 23:10:57,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.828e+01 9.396e+01 9.982e+01 1.339e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 23:10:59,042 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1200, loss[loss=0.07023, simple_loss=0.0966, pruned_loss=0.01363, audio_tagging_loss=0.008303, over 13654.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08866, pruned_loss=0.01191, audio_tagging_loss=0.008525, over 3038501.24 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:11:04,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3615186.6666666665, ans=0.125 2023-11-26 23:11:20,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-26 23:11:21,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-26 23:11:33,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3615386.6666666665, ans=0.0 2023-11-26 23:11:45,059 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:11:48,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3615453.3333333335, ans=0.125 2023-11-26 23:11:54,917 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1250, loss[loss=0.04776, simple_loss=0.0596, pruned_loss=0.007919, audio_tagging_loss=0.01004, over 14176.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.0883, pruned_loss=0.01191, audio_tagging_loss=0.008499, over 3043520.46 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:11:58,849 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:12:02,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3615520.0, ans=0.125 2023-11-26 23:12:08,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=22.5 2023-11-26 23:12:17,480 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-26 23:12:21,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-26 23:12:25,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3615653.3333333335, ans=0.2 2023-11-26 23:12:26,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=12.0 2023-11-26 23:12:34,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3615720.0, ans=0.125 2023-11-26 23:12:37,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3615720.0, ans=0.0 2023-11-26 23:12:44,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3615786.6666666665, ans=0.2 2023-11-26 23:12:48,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3615786.6666666665, ans=0.04949747468305833 2023-11-26 23:12:49,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.892e+01 9.390e+01 1.002e+02 1.440e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 23:12:51,053 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1300, loss[loss=0.05997, simple_loss=0.08506, pruned_loss=0.01081, audio_tagging_loss=0.006632, over 13084.00 frames. ], tot_loss[loss=0.06392, simple_loss=0.08733, pruned_loss=0.01174, audio_tagging_loss=0.008511, over 3037501.45 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:12:54,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2023-11-26 23:13:07,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3615920.0, ans=0.2 2023-11-26 23:13:10,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3615920.0, ans=0.0 2023-11-26 23:13:12,841 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-26 23:13:17,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3615986.6666666665, ans=0.125 2023-11-26 23:13:30,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616053.3333333335, ans=0.1 2023-11-26 23:13:47,135 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1350, loss[loss=0.07, simple_loss=0.1017, pruned_loss=0.0119, audio_tagging_loss=0.007276, over 14530.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08769, pruned_loss=0.01188, audio_tagging_loss=0.008621, over 3043856.33 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:13:53,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3616186.6666666665, ans=0.125 2023-11-26 23:14:08,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3616320.0, ans=0.1 2023-11-26 23:14:09,905 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-26 23:14:15,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3616320.0, ans=0.125 2023-11-26 23:14:26,370 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:14:41,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.028e+01 9.621e+01 1.020e+02 1.402e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 23:14:42,812 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1400, loss[loss=0.05921, simple_loss=0.06984, pruned_loss=0.01346, audio_tagging_loss=0.01083, over 14763.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08866, pruned_loss=0.01191, audio_tagging_loss=0.008655, over 3046076.87 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:15:05,797 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-26 23:15:27,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3616786.6666666665, ans=0.5 2023-11-26 23:15:39,433 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1450, loss[loss=0.0644, simple_loss=0.07915, pruned_loss=0.01683, audio_tagging_loss=0.007988, over 14383.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08924, pruned_loss=0.01204, audio_tagging_loss=0.008677, over 3045743.60 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:15:44,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3616853.3333333335, ans=0.125 2023-11-26 23:15:45,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616853.3333333335, ans=0.1 2023-11-26 23:16:01,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-26 23:16:02,660 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:16:10,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3616986.6666666665, ans=0.125 2023-11-26 23:16:34,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 9.029e+01 9.878e+01 1.085e+02 1.417e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 23:16:35,649 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1500, loss[loss=0.06267, simple_loss=0.07915, pruned_loss=0.0126, audio_tagging_loss=0.0105, over 15872.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08924, pruned_loss=0.01211, audio_tagging_loss=0.008744, over 3041168.86 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:16:35,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3617186.6666666665, ans=0.125 2023-11-26 23:16:58,176 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-26 23:17:26,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3617453.3333333335, ans=0.125 2023-11-26 23:17:31,313 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1550, loss[loss=0.06349, simple_loss=0.08264, pruned_loss=0.009964, audio_tagging_loss=0.01221, over 14918.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.0884, pruned_loss=0.01188, audio_tagging_loss=0.008896, over 3042925.22 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:17:47,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3617586.6666666665, ans=0.0 2023-11-26 23:17:53,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3617653.3333333335, ans=0.0 2023-11-26 23:17:54,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-26 23:18:01,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-11-26 23:18:03,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-26 23:18:04,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3617720.0, ans=0.125 2023-11-26 23:18:22,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3617786.6666666665, ans=0.125 2023-11-26 23:18:26,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.195e+01 9.800e+01 1.042e+02 1.280e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 23:18:27,651 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1600, loss[loss=0.05987, simple_loss=0.08133, pruned_loss=0.01036, audio_tagging_loss=0.008847, over 14932.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08879, pruned_loss=0.01192, audio_tagging_loss=0.008962, over 3042164.62 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:18:49,976 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-26 23:19:00,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618053.3333333335, ans=0.1 2023-11-26 23:19:06,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3618053.3333333335, ans=0.125 2023-11-26 23:19:15,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2023-11-26 23:19:16,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=22.5 2023-11-26 23:19:18,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3618120.0, ans=15.0 2023-11-26 23:19:19,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3618120.0, ans=0.0 2023-11-26 23:19:24,002 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1650, loss[loss=0.06868, simple_loss=0.09546, pruned_loss=0.01001, audio_tagging_loss=0.01094, over 14864.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08864, pruned_loss=0.01194, audio_tagging_loss=0.008981, over 3039390.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:19:26,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3618186.6666666665, ans=0.125 2023-11-26 23:19:37,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-11-26 23:19:41,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3618253.3333333335, ans=0.125 2023-11-26 23:19:45,836 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-26 23:19:47,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3618320.0, ans=0.0 2023-11-26 23:19:51,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3618320.0, ans=0.0 2023-11-26 23:19:57,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3618386.6666666665, ans=6.0 2023-11-26 23:20:06,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3618386.6666666665, ans=0.2 2023-11-26 23:20:14,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=12.0 2023-11-26 23:20:19,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.999e+01 9.528e+01 1.009e+02 1.834e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 23:20:19,480 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1700, loss[loss=0.05345, simple_loss=0.07416, pruned_loss=0.008249, audio_tagging_loss=0.00812, over 15282.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08833, pruned_loss=0.01184, audio_tagging_loss=0.008928, over 3043200.83 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:20:25,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3618520.0, ans=0.0 2023-11-26 23:20:41,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-26 23:20:45,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=22.5 2023-11-26 23:20:51,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3618653.3333333335, ans=0.125 2023-11-26 23:21:15,668 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1750, loss[loss=0.07716, simple_loss=0.1052, pruned_loss=0.01588, audio_tagging_loss=0.008707, over 16531.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08794, pruned_loss=0.01178, audio_tagging_loss=0.008891, over 3044872.93 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:21:19,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3618853.3333333335, ans=0.1 2023-11-26 23:21:38,169 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-26 23:21:45,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=22.5 2023-11-26 23:21:51,590 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:22:08,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3619120.0, ans=0.0 2023-11-26 23:22:11,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.875e+01 9.667e+01 1.011e+02 1.829e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 23:22:11,460 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1800, loss[loss=0.0575, simple_loss=0.07457, pruned_loss=0.01088, audio_tagging_loss=0.009341, over 14834.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08778, pruned_loss=0.01172, audio_tagging_loss=0.008768, over 3044119.02 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:22:29,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3619253.3333333335, ans=0.125 2023-11-26 23:22:34,088 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-26 23:23:00,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3619453.3333333335, ans=0.125 2023-11-26 23:23:02,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2023-11-26 23:23:07,872 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1850, loss[loss=0.07534, simple_loss=0.111, pruned_loss=0.0127, audio_tagging_loss=0.007128, over 14978.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08823, pruned_loss=0.01157, audio_tagging_loss=0.008705, over 3044920.98 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:23:16,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3619520.0, ans=0.0 2023-11-26 23:23:21,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3619586.6666666665, ans=0.2 2023-11-26 23:23:30,149 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-26 23:23:32,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3619653.3333333335, ans=22.5 2023-11-26 23:24:02,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.49 vs. limit=6.0 2023-11-26 23:24:04,165 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1900, loss[loss=0.04671, simple_loss=0.05798, pruned_loss=0.003828, audio_tagging_loss=0.01389, over 15373.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08921, pruned_loss=0.01171, audio_tagging_loss=0.008652, over 3048356.33 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:24:05,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 9.189e+01 9.752e+01 1.031e+02 1.213e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 23:24:12,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3619853.3333333335, ans=0.125 2023-11-26 23:24:21,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.67 vs. limit=6.0 2023-11-26 23:24:26,786 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-26 23:24:42,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-26 23:24:49,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3620120.0, ans=0.2 2023-11-26 23:24:56,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3620120.0, ans=0.125 2023-11-26 23:24:59,803 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1950, loss[loss=0.06982, simple_loss=0.09714, pruned_loss=0.01164, audio_tagging_loss=0.009613, over 16490.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08934, pruned_loss=0.01189, audio_tagging_loss=0.008521, over 3051997.41 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:25:07,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3620186.6666666665, ans=0.125 2023-11-26 23:25:17,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2023-11-26 23:25:22,769 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-26 23:25:22,930 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:25:26,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3620320.0, ans=0.125 2023-11-26 23:25:38,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3620386.6666666665, ans=0.125 2023-11-26 23:25:40,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3620386.6666666665, ans=15.0 2023-11-26 23:25:40,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3620386.6666666665, ans=0.1 2023-11-26 23:25:49,783 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:25:55,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3620520.0, ans=0.0 2023-11-26 23:25:56,332 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2000, loss[loss=0.04921, simple_loss=0.06661, pruned_loss=0.006197, audio_tagging_loss=0.009705, over 14654.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08788, pruned_loss=0.01179, audio_tagging_loss=0.008623, over 3046282.95 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:25:57,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.817e+01 9.525e+01 1.016e+02 1.209e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 23:26:09,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3620586.6666666665, ans=0.0 2023-11-26 23:26:18,549 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-26 23:26:26,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3620653.3333333335, ans=0.125 2023-11-26 23:26:49,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3620786.6666666665, ans=0.125 2023-11-26 23:26:52,270 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2050, loss[loss=0.07703, simple_loss=0.1095, pruned_loss=0.01505, audio_tagging_loss=0.007247, over 15846.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08786, pruned_loss=0.01177, audio_tagging_loss=0.008634, over 3038861.41 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:26:59,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.95 vs. limit=10.0 2023-11-26 23:27:12,893 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:27:14,778 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-26 23:27:34,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3621053.3333333335, ans=0.125 2023-11-26 23:27:48,128 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2100, loss[loss=0.0613, simple_loss=0.08464, pruned_loss=0.009785, audio_tagging_loss=0.009196, over 16015.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08798, pruned_loss=0.0117, audio_tagging_loss=0.008605, over 3042704.78 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:27:50,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.873e+01 9.430e+01 1.020e+02 1.802e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 23:27:52,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3621186.6666666665, ans=0.0 2023-11-26 23:28:08,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3621253.3333333335, ans=0.125 2023-11-26 23:28:10,391 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-26 23:28:13,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3621320.0, ans=0.0 2023-11-26 23:28:16,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3621320.0, ans=0.125 2023-11-26 23:28:26,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3621386.6666666665, ans=0.0 2023-11-26 23:28:40,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2023-11-26 23:28:44,526 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2150, loss[loss=0.06389, simple_loss=0.08639, pruned_loss=0.01294, audio_tagging_loss=0.007764, over 16446.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08979, pruned_loss=0.01214, audio_tagging_loss=0.008541, over 3046575.49 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:28:53,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3621520.0, ans=0.05 2023-11-26 23:28:59,733 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:29:07,513 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-26 23:29:09,773 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:29:17,488 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:29:27,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-26 23:29:28,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3621786.6666666665, ans=0.0 2023-11-26 23:29:32,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3621786.6666666665, ans=0.0 2023-11-26 23:29:41,014 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2200, loss[loss=0.05474, simple_loss=0.07133, pruned_loss=0.01133, audio_tagging_loss=0.00775, over 15154.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08899, pruned_loss=0.01209, audio_tagging_loss=0.008646, over 3043577.00 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:29:42,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3621853.3333333335, ans=0.1 2023-11-26 23:29:43,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.935e+01 9.696e+01 1.032e+02 1.602e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 23:29:47,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3621853.3333333335, ans=0.0 2023-11-26 23:30:03,484 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-26 23:30:13,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3622053.3333333335, ans=0.125 2023-11-26 23:30:14,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3622053.3333333335, ans=0.0 2023-11-26 23:30:36,863 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2250, loss[loss=0.06382, simple_loss=0.08849, pruned_loss=0.01093, audio_tagging_loss=0.008646, over 15139.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08929, pruned_loss=0.01219, audio_tagging_loss=0.00866, over 3049536.14 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:30:45,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-26 23:30:54,619 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:30:58,753 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-26 23:31:02,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2023-11-26 23:31:03,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3622320.0, ans=0.0 2023-11-26 23:31:04,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2023-11-26 23:31:25,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3622453.3333333335, ans=0.1 2023-11-26 23:31:32,019 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2300, loss[loss=0.06736, simple_loss=0.09945, pruned_loss=0.01136, audio_tagging_loss=0.006277, over 14814.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08891, pruned_loss=0.01211, audio_tagging_loss=0.008736, over 3046821.55 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:31:34,118 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.796e+01 9.547e+01 1.006e+02 1.160e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 23:31:35,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3622520.0, ans=10.0 2023-11-26 23:31:42,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-26 23:31:49,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-26 23:31:55,044 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-26 23:31:55,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3622653.3333333335, ans=0.0 2023-11-26 23:32:09,969 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:32:11,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-26 23:32:12,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3622720.0, ans=0.0 2023-11-26 23:32:14,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-11-26 23:32:20,293 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:32:27,719 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2350, loss[loss=0.05671, simple_loss=0.07409, pruned_loss=0.01053, audio_tagging_loss=0.009134, over 14922.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08904, pruned_loss=0.01236, audio_tagging_loss=0.008794, over 3044504.18 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:32:36,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2023-11-26 23:32:39,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3622920.0, ans=0.0 2023-11-26 23:32:39,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-26 23:32:51,410 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-26 23:33:01,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3623053.3333333335, ans=0.0 2023-11-26 23:33:25,279 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2400, loss[loss=0.08264, simple_loss=0.1257, pruned_loss=0.01225, audio_tagging_loss=0.007553, over 15098.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08912, pruned_loss=0.01239, audio_tagging_loss=0.008836, over 3044363.34 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:33:27,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.979e+01 9.586e+01 1.037e+02 1.629e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 23:33:29,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3623186.6666666665, ans=0.125 2023-11-26 23:33:35,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=15.0 2023-11-26 23:33:47,468 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-26 23:33:49,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3623320.0, ans=0.125 2023-11-26 23:33:52,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2023-11-26 23:34:16,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3623453.3333333335, ans=0.125 2023-11-26 23:34:21,519 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2450, loss[loss=0.07624, simple_loss=0.1082, pruned_loss=0.0137, audio_tagging_loss=0.00843, over 14691.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08938, pruned_loss=0.01228, audio_tagging_loss=0.008959, over 3046597.50 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:34:25,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-26 23:34:38,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3623586.6666666665, ans=0.025 2023-11-26 23:34:41,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3623586.6666666665, ans=0.125 2023-11-26 23:34:44,222 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-26 23:34:57,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3623720.0, ans=0.0 2023-11-26 23:35:06,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3623786.6666666665, ans=0.0 2023-11-26 23:35:16,743 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2500, loss[loss=0.08954, simple_loss=0.1309, pruned_loss=0.01896, audio_tagging_loss=0.005141, over 15916.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08979, pruned_loss=0.01233, audio_tagging_loss=0.008981, over 3046507.50 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:35:18,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.886e+01 9.376e+01 1.002e+02 1.338e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 23:35:20,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3623853.3333333335, ans=0.125 2023-11-26 23:35:40,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-26 23:36:14,153 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2550, loss[loss=0.07659, simple_loss=0.1033, pruned_loss=0.01526, audio_tagging_loss=0.009689, over 15221.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09012, pruned_loss=0.01237, audio_tagging_loss=0.008848, over 3043172.59 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:36:27,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3624253.3333333335, ans=0.025 2023-11-26 23:36:27,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3624253.3333333335, ans=0.1 2023-11-26 23:36:36,156 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-26 23:36:36,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3624320.0, ans=0.0 2023-11-26 23:36:36,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-26 23:36:46,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3624386.6666666665, ans=0.125 2023-11-26 23:37:04,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3624453.3333333335, ans=0.125 2023-11-26 23:37:08,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3624453.3333333335, ans=0.125 2023-11-26 23:37:09,907 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2600, loss[loss=0.07375, simple_loss=0.1033, pruned_loss=0.01421, audio_tagging_loss=0.007907, over 14511.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0889, pruned_loss=0.01217, audio_tagging_loss=0.008708, over 3040248.13 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:37:11,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.743e+01 9.424e+01 1.014e+02 1.712e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 23:37:18,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3624520.0, ans=0.0 2023-11-26 23:37:31,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-26 23:38:05,156 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2650, loss[loss=0.08335, simple_loss=0.12, pruned_loss=0.01478, audio_tagging_loss=0.008571, over 16358.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08825, pruned_loss=0.012, audio_tagging_loss=0.008666, over 3044490.16 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:38:21,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-11-26 23:38:22,603 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:38:27,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-26 23:38:28,283 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-26 23:38:36,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2023-11-26 23:38:52,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-26 23:38:54,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3625120.0, ans=0.09899494936611666 2023-11-26 23:38:58,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3625120.0, ans=0.125 2023-11-26 23:38:58,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3625120.0, ans=0.125 2023-11-26 23:39:01,799 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2700, loss[loss=0.06475, simple_loss=0.09233, pruned_loss=0.01147, audio_tagging_loss=0.007109, over 15643.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08856, pruned_loss=0.012, audio_tagging_loss=0.008604, over 3053299.23 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:39:03,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.924e+01 9.565e+01 1.006e+02 1.395e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 23:39:20,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3625253.3333333335, ans=0.0 2023-11-26 23:39:24,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-26 23:39:42,776 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:39:43,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-11-26 23:39:44,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3625386.6666666665, ans=0.0 2023-11-26 23:39:45,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.02 vs. limit=22.5 2023-11-26 23:39:52,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3625453.3333333335, ans=0.125 2023-11-26 23:39:58,529 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2750, loss[loss=0.08397, simple_loss=0.1276, pruned_loss=0.01329, audio_tagging_loss=0.006863, over 14374.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08805, pruned_loss=0.01187, audio_tagging_loss=0.00867, over 3051231.31 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:40:20,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-26 23:40:27,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3625653.3333333335, ans=0.125 2023-11-26 23:40:37,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3625720.0, ans=0.0 2023-11-26 23:40:37,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3625720.0, ans=0.125 2023-11-26 23:40:44,670 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:40:44,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3625786.6666666665, ans=0.0 2023-11-26 23:40:51,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.73 vs. limit=6.0 2023-11-26 23:40:53,062 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2800, loss[loss=0.06637, simple_loss=0.0923, pruned_loss=0.01256, audio_tagging_loss=0.007663, over 15856.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08754, pruned_loss=0.01174, audio_tagging_loss=0.008656, over 3058480.83 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:40:53,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2023-11-26 23:40:55,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.871e+01 8.947e+01 9.554e+01 1.028e+02 1.223e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:41:11,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3625920.0, ans=0.05 2023-11-26 23:41:12,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3625920.0, ans=0.125 2023-11-26 23:41:15,469 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-26 23:41:17,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3625986.6666666665, ans=0.125 2023-11-26 23:41:41,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3626120.0, ans=0.1 2023-11-26 23:41:42,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3626120.0, ans=0.07 2023-11-26 23:41:46,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3626120.0, ans=0.125 2023-11-26 23:41:49,639 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2850, loss[loss=0.05607, simple_loss=0.07964, pruned_loss=0.008264, audio_tagging_loss=0.007984, over 16483.00 frames. ], tot_loss[loss=0.06378, simple_loss=0.08695, pruned_loss=0.01165, audio_tagging_loss=0.008656, over 3053256.29 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:41:57,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3626186.6666666665, ans=0.0 2023-11-26 23:42:00,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3626253.3333333335, ans=0.125 2023-11-26 23:42:12,193 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-26 23:42:23,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-26 23:42:45,068 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2900, loss[loss=0.06184, simple_loss=0.08196, pruned_loss=0.01177, audio_tagging_loss=0.009094, over 14551.00 frames. ], tot_loss[loss=0.06377, simple_loss=0.08664, pruned_loss=0.01178, audio_tagging_loss=0.008665, over 3047597.59 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:42:47,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.936e+01 9.597e+01 1.046e+02 1.381e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 23:43:02,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3626586.6666666665, ans=0.125 2023-11-26 23:43:07,733 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-26 23:43:15,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-26 23:43:40,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3626786.6666666665, ans=0.0 2023-11-26 23:43:44,236 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2950, loss[loss=0.07249, simple_loss=0.09993, pruned_loss=0.01375, audio_tagging_loss=0.008775, over 13991.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08799, pruned_loss=0.01202, audio_tagging_loss=0.008581, over 3048262.77 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:43:44,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3626853.3333333335, ans=0.09899494936611666 2023-11-26 23:44:01,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-26 23:44:06,774 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-26 23:44:08,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-26 23:44:23,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627053.3333333335, ans=0.1 2023-11-26 23:44:24,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3627053.3333333335, ans=0.2 2023-11-26 23:44:25,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3627053.3333333335, ans=0.125 2023-11-26 23:44:27,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3627053.3333333335, ans=0.125 2023-11-26 23:44:31,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3627120.0, ans=0.05 2023-11-26 23:44:38,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3627120.0, ans=0.125 2023-11-26 23:44:40,278 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3000, loss[loss=0.05728, simple_loss=0.07913, pruned_loss=0.007988, audio_tagging_loss=0.009731, over 15731.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08802, pruned_loss=0.01206, audio_tagging_loss=0.008649, over 3046406.34 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:44:40,279 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-26 23:45:12,592 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.0572, simple_loss=0.05043, pruned_loss=0.00523, audio_tagging_loss=0.02676, over 4681554.00 frames. 2023-11-26 23:45:12,593 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-26 23:45:15,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.002e+01 9.589e+01 1.016e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:45:35,146 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-26 23:45:39,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.73 vs. limit=15.0 2023-11-26 23:45:40,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3627320.0, ans=0.1 2023-11-26 23:45:50,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3627386.6666666665, ans=0.125 2023-11-26 23:45:57,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3627453.3333333335, ans=0.1 2023-11-26 23:46:08,625 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3050, loss[loss=0.07435, simple_loss=0.09769, pruned_loss=0.01666, audio_tagging_loss=0.008842, over 16024.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08914, pruned_loss=0.01222, audio_tagging_loss=0.008664, over 3054468.12 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:46:14,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3627520.0, ans=0.125 2023-11-26 23:46:30,852 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-26 23:46:39,921 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:46:52,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3627786.6666666665, ans=0.0 2023-11-26 23:47:04,322 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3100, loss[loss=0.07419, simple_loss=0.1052, pruned_loss=0.01286, audio_tagging_loss=0.008709, over 15589.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0889, pruned_loss=0.01205, audio_tagging_loss=0.008669, over 3052337.04 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:47:08,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.067e+01 9.651e+01 1.052e+02 1.316e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-26 23:47:24,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3627920.0, ans=0.1 2023-11-26 23:47:25,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3627986.6666666665, ans=0.125 2023-11-26 23:47:27,267 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-26 23:47:42,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2023-11-26 23:47:46,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3628053.3333333335, ans=0.125 2023-11-26 23:48:01,335 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3150, loss[loss=0.07244, simple_loss=0.0924, pruned_loss=0.01692, audio_tagging_loss=0.009314, over 15231.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08959, pruned_loss=0.01217, audio_tagging_loss=0.008664, over 3052537.54 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:48:03,679 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:48:09,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3628186.6666666665, ans=0.0 2023-11-26 23:48:10,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-26 23:48:20,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3628253.3333333335, ans=0.0 2023-11-26 23:48:20,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3628253.3333333335, ans=0.2 2023-11-26 23:48:23,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-26 23:48:35,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=22.5 2023-11-26 23:48:49,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3628453.3333333335, ans=0.125 2023-11-26 23:48:57,458 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3200, loss[loss=0.04901, simple_loss=0.07071, pruned_loss=0.005935, audio_tagging_loss=0.007725, over 14471.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09072, pruned_loss=0.01222, audio_tagging_loss=0.008749, over 3050827.20 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:49:00,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.824e+01 9.434e+01 1.022e+02 1.249e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 23:49:05,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3628520.0, ans=0.125 2023-11-26 23:49:13,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3628586.6666666665, ans=0.0 2023-11-26 23:49:15,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3628586.6666666665, ans=0.0 2023-11-26 23:49:19,852 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-26 23:49:27,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3628653.3333333335, ans=0.2 2023-11-26 23:49:53,366 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3250, loss[loss=0.04465, simple_loss=0.05632, pruned_loss=0.007276, audio_tagging_loss=0.009219, over 15037.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09028, pruned_loss=0.01216, audio_tagging_loss=0.008756, over 3056139.49 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:49:54,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3628853.3333333335, ans=0.2 2023-11-26 23:49:58,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3628853.3333333335, ans=0.2 2023-11-26 23:50:00,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3628853.3333333335, ans=0.125 2023-11-26 23:50:01,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3628853.3333333335, ans=0.0 2023-11-26 23:50:13,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2023-11-26 23:50:15,642 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-26 23:50:15,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3628986.6666666665, ans=0.0 2023-11-26 23:50:21,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3628986.6666666665, ans=0.1 2023-11-26 23:50:48,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3629186.6666666665, ans=0.2 2023-11-26 23:50:48,931 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3300, loss[loss=0.07311, simple_loss=0.1014, pruned_loss=0.01355, audio_tagging_loss=0.008849, over 15421.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09083, pruned_loss=0.01224, audio_tagging_loss=0.008852, over 3056664.96 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:50:52,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.136e+01 9.828e+01 1.104e+02 1.362e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 23:50:59,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3629253.3333333335, ans=0.1 2023-11-26 23:51:05,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3629253.3333333335, ans=0.125 2023-11-26 23:51:11,476 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-26 23:51:33,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3629453.3333333335, ans=0.0 2023-11-26 23:51:43,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3629453.3333333335, ans=0.125 2023-11-26 23:51:45,137 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3350, loss[loss=0.03697, simple_loss=0.05094, pruned_loss=0.003015, audio_tagging_loss=0.008479, over 15749.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09087, pruned_loss=0.01227, audio_tagging_loss=0.008779, over 3055914.96 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:51:52,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3629520.0, ans=10.0 2023-11-26 23:52:06,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-11-26 23:52:07,890 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-26 23:52:08,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-26 23:52:20,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=12.0 2023-11-26 23:52:40,873 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3400, loss[loss=0.07662, simple_loss=0.1147, pruned_loss=0.014, audio_tagging_loss=0.005252, over 15883.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09072, pruned_loss=0.0121, audio_tagging_loss=0.008665, over 3053838.51 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:52:44,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3629853.3333333335, ans=0.0 2023-11-26 23:52:45,597 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.870e+01 9.488e+01 1.024e+02 1.498e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 23:52:51,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3629920.0, ans=0.125 2023-11-26 23:52:53,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3629920.0, ans=0.0 2023-11-26 23:52:58,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3629920.0, ans=0.125 2023-11-26 23:53:02,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3629986.6666666665, ans=0.125 2023-11-26 23:53:03,579 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-26 23:53:37,194 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3450, loss[loss=0.05767, simple_loss=0.07533, pruned_loss=0.009888, audio_tagging_loss=0.01011, over 14811.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09114, pruned_loss=0.0122, audio_tagging_loss=0.008565, over 3051833.65 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:53:38,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3630186.6666666665, ans=0.0 2023-11-26 23:53:41,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2023-11-26 23:53:43,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3630186.6666666665, ans=0.125 2023-11-26 23:53:47,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3630253.3333333335, ans=0.0 2023-11-26 23:53:58,888 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-26 23:54:09,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-26 23:54:26,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3630453.3333333335, ans=0.125 2023-11-26 23:54:32,553 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3500, loss[loss=0.04693, simple_loss=0.05838, pruned_loss=0.009295, audio_tagging_loss=0.008446, over 16395.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0907, pruned_loss=0.01216, audio_tagging_loss=0.008496, over 3056905.21 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:54:36,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 9.117e+01 9.795e+01 1.053e+02 1.409e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 23:54:40,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3630520.0, ans=0.125 2023-11-26 23:54:50,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3630586.6666666665, ans=0.125 2023-11-26 23:54:55,516 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-26 23:55:00,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-11-26 23:55:01,006 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:55:26,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3630786.6666666665, ans=0.07 2023-11-26 23:55:26,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3630786.6666666665, ans=0.0 2023-11-26 23:55:28,215 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3550, loss[loss=0.07736, simple_loss=0.1099, pruned_loss=0.01488, audio_tagging_loss=0.007543, over 15253.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09029, pruned_loss=0.01206, audio_tagging_loss=0.008506, over 3052392.19 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:55:44,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-11-26 23:55:51,554 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-26 23:55:55,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3630986.6666666665, ans=0.0 2023-11-26 23:56:11,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3631053.3333333335, ans=0.0 2023-11-26 23:56:17,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3631120.0, ans=0.07 2023-11-26 23:56:23,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3631120.0, ans=0.0 2023-11-26 23:56:25,421 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3600, loss[loss=0.05134, simple_loss=0.06273, pruned_loss=0.008745, audio_tagging_loss=0.01123, over 14580.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08965, pruned_loss=0.0122, audio_tagging_loss=0.008536, over 3046751.45 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:56:29,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.770e+01 9.299e+01 1.012e+02 1.507e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 23:56:37,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3631253.3333333335, ans=0.125 2023-11-26 23:56:38,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3631253.3333333335, ans=0.125 2023-11-26 23:56:40,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3631253.3333333335, ans=0.1 2023-11-26 23:56:45,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3631253.3333333335, ans=0.1 2023-11-26 23:56:47,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-26 23:56:47,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3631320.0, ans=0.0 2023-11-26 23:56:48,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3631320.0, ans=0.125 2023-11-26 23:57:06,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3631386.6666666665, ans=0.0 2023-11-26 23:57:20,904 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3650, loss[loss=0.07484, simple_loss=0.09726, pruned_loss=0.01544, audio_tagging_loss=0.01077, over 15551.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09, pruned_loss=0.01219, audio_tagging_loss=0.008445, over 3046618.33 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:57:26,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3631520.0, ans=0.05 2023-11-26 23:57:40,823 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:57:43,359 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-26 23:57:53,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3631720.0, ans=0.1 2023-11-26 23:58:11,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3631786.6666666665, ans=0.0 2023-11-26 23:58:16,362 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3700, loss[loss=0.04655, simple_loss=0.0585, pruned_loss=0.006683, audio_tagging_loss=0.01061, over 17648.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09035, pruned_loss=0.01231, audio_tagging_loss=0.008381, over 3049741.08 frames. ], batch size: 69, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:58:20,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.914e+01 9.498e+01 1.020e+02 1.600e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 23:58:20,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3631853.3333333335, ans=0.125 2023-11-26 23:58:24,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631853.3333333335, ans=0.1 2023-11-26 23:58:27,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3631920.0, ans=0.125 2023-11-26 23:58:38,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3631920.0, ans=0.125 2023-11-26 23:58:40,016 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-26 23:58:40,179 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:58:41,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3631986.6666666665, ans=0.1 2023-11-26 23:58:42,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3631986.6666666665, ans=0.125 2023-11-26 23:58:44,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631986.6666666665, ans=0.1 2023-11-26 23:58:58,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3632053.3333333335, ans=0.0 2023-11-26 23:59:09,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3632120.0, ans=0.0 2023-11-26 23:59:12,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-26 23:59:13,826 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3750, loss[loss=0.06205, simple_loss=0.09002, pruned_loss=0.008924, audio_tagging_loss=0.008121, over 16284.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09063, pruned_loss=0.01238, audio_tagging_loss=0.008441, over 3057455.36 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:59:14,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-26 23:59:15,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3632186.6666666665, ans=0.125 2023-11-26 23:59:28,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-11-26 23:59:35,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-26 23:59:45,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3632386.6666666665, ans=0.125 2023-11-26 23:59:45,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632386.6666666665, ans=0.1 2023-11-26 23:59:51,103 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:00:09,631 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3800, loss[loss=0.07711, simple_loss=0.1088, pruned_loss=0.0138, audio_tagging_loss=0.00889, over 14950.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09019, pruned_loss=0.01227, audio_tagging_loss=0.008546, over 3055833.74 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:00:14,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.124e+01 9.737e+01 1.067e+02 1.479e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:00:18,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=15.0 2023-11-27 00:00:20,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3632586.6666666665, ans=0.0 2023-11-27 00:00:31,627 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-27 00:00:46,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632720.0, ans=0.1 2023-11-27 00:00:48,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3632720.0, ans=0.125 2023-11-27 00:00:52,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3632720.0, ans=0.125 2023-11-27 00:01:01,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3632786.6666666665, ans=0.125 2023-11-27 00:01:04,884 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3850, loss[loss=0.06804, simple_loss=0.09193, pruned_loss=0.0144, audio_tagging_loss=0.007684, over 15463.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09025, pruned_loss=0.01228, audio_tagging_loss=0.008625, over 3049803.79 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:01:06,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3632853.3333333335, ans=0.125 2023-11-27 00:01:08,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3632853.3333333335, ans=0.0 2023-11-27 00:01:19,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=15.0 2023-11-27 00:01:26,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3632920.0, ans=0.2 2023-11-27 00:01:27,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-11-27 00:01:28,641 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-27 00:01:31,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3632986.6666666665, ans=0.0 2023-11-27 00:01:33,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3632986.6666666665, ans=0.125 2023-11-27 00:01:38,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-27 00:01:48,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3633120.0, ans=0.09899494936611666 2023-11-27 00:01:49,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3633120.0, ans=10.0 2023-11-27 00:01:49,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3633120.0, ans=0.125 2023-11-27 00:01:58,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3633120.0, ans=0.0 2023-11-27 00:02:01,474 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3900, loss[loss=0.05095, simple_loss=0.07539, pruned_loss=0.006474, audio_tagging_loss=0.006774, over 14728.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08922, pruned_loss=0.01216, audio_tagging_loss=0.0088, over 3042602.49 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:02:07,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.766e+01 9.510e+01 1.042e+02 1.590e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:02:08,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3633186.6666666665, ans=0.2 2023-11-27 00:02:09,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2023-11-27 00:02:12,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633253.3333333335, ans=0.1 2023-11-27 00:02:23,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-27 00:02:45,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2023-11-27 00:02:47,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=22.5 2023-11-27 00:02:49,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-11-27 00:02:58,143 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3950, loss[loss=0.05703, simple_loss=0.07754, pruned_loss=0.009867, audio_tagging_loss=0.008393, over 16045.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08942, pruned_loss=0.01223, audio_tagging_loss=0.008781, over 3040550.46 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:02:59,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3633520.0, ans=0.07 2023-11-27 00:03:00,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3633520.0, ans=0.0 2023-11-27 00:03:14,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-27 00:03:15,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3633586.6666666665, ans=0.125 2023-11-27 00:03:19,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-27 00:03:40,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-27 00:03:48,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3633786.6666666665, ans=0.125 2023-11-27 00:03:52,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3633853.3333333335, ans=0.1 2023-11-27 00:03:53,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2023-11-27 00:03:53,747 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4000, loss[loss=0.04929, simple_loss=0.06171, pruned_loss=0.0078, audio_tagging_loss=0.01064, over 15860.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08944, pruned_loss=0.01235, audio_tagging_loss=0.008825, over 3041753.38 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:03:59,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 9.088e+01 9.544e+01 1.045e+02 1.311e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 00:04:01,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633853.3333333335, ans=0.1 2023-11-27 00:04:16,120 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-27 00:04:25,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3633986.6666666665, ans=0.1 2023-11-27 00:04:33,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634053.3333333335, ans=0.1 2023-11-27 00:04:43,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3634120.0, ans=0.125 2023-11-27 00:04:49,498 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4050, loss[loss=0.08246, simple_loss=0.12, pruned_loss=0.01478, audio_tagging_loss=0.007696, over 15306.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08906, pruned_loss=0.0121, audio_tagging_loss=0.008917, over 3039098.27 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:04:52,285 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:04:54,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-27 00:05:12,217 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-27 00:05:22,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3634386.6666666665, ans=0.125 2023-11-27 00:05:30,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3634386.6666666665, ans=0.125 2023-11-27 00:05:40,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3634453.3333333335, ans=0.2 2023-11-27 00:05:43,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3634453.3333333335, ans=0.04949747468305833 2023-11-27 00:05:46,175 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4100, loss[loss=0.06181, simple_loss=0.08317, pruned_loss=0.0109, audio_tagging_loss=0.009323, over 14679.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08959, pruned_loss=0.01221, audio_tagging_loss=0.008787, over 3040949.71 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:05:48,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2023-11-27 00:05:52,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.888e+01 9.665e+01 1.037e+02 1.522e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:06:01,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.40 vs. limit=10.0 2023-11-27 00:06:07,382 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-27 00:06:27,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3634720.0, ans=0.025 2023-11-27 00:06:41,831 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4150, loss[loss=0.07274, simple_loss=0.08771, pruned_loss=0.01894, audio_tagging_loss=0.009946, over 15037.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09004, pruned_loss=0.0124, audio_tagging_loss=0.008699, over 3037764.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:06:57,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3634920.0, ans=0.2 2023-11-27 00:07:02,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3634920.0, ans=0.125 2023-11-27 00:07:04,261 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-27 00:07:06,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3634986.6666666665, ans=0.125 2023-11-27 00:07:22,302 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:07:24,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3635053.3333333335, ans=0.125 2023-11-27 00:07:28,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2023-11-27 00:07:32,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3635120.0, ans=0.0 2023-11-27 00:07:37,621 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4200, loss[loss=0.06387, simple_loss=0.08021, pruned_loss=0.01595, audio_tagging_loss=0.00782, over 15776.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09016, pruned_loss=0.01246, audio_tagging_loss=0.008583, over 3040868.60 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:07:39,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-27 00:07:44,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.031e+01 9.580e+01 1.007e+02 1.196e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:07:46,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3635186.6666666665, ans=0.125 2023-11-27 00:07:54,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3635253.3333333335, ans=0.125 2023-11-27 00:08:00,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-27 00:08:02,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3635320.0, ans=0.125 2023-11-27 00:08:18,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3635386.6666666665, ans=0.125 2023-11-27 00:08:23,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.96 vs. limit=22.5 2023-11-27 00:08:24,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3635453.3333333335, ans=0.125 2023-11-27 00:08:24,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3635453.3333333335, ans=0.0 2023-11-27 00:08:30,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3635453.3333333335, ans=0.2 2023-11-27 00:08:33,923 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4250, loss[loss=0.06636, simple_loss=0.09205, pruned_loss=0.0114, audio_tagging_loss=0.008929, over 16453.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08999, pruned_loss=0.01231, audio_tagging_loss=0.008595, over 3046211.41 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:08:49,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-27 00:08:56,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-27 00:09:08,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3635720.0, ans=0.125 2023-11-27 00:09:11,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3635720.0, ans=0.1 2023-11-27 00:09:13,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-11-27 00:09:18,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3635786.6666666665, ans=0.125 2023-11-27 00:09:22,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3635786.6666666665, ans=0.0 2023-11-27 00:09:25,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3635786.6666666665, ans=0.1 2023-11-27 00:09:30,142 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4300, loss[loss=0.06131, simple_loss=0.09359, pruned_loss=0.008905, audio_tagging_loss=0.005611, over 16959.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09027, pruned_loss=0.0123, audio_tagging_loss=0.008486, over 3049654.68 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:09:36,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.001e+01 9.508e+01 1.030e+02 1.268e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:09:52,688 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-27 00:10:00,287 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:10:02,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3635986.6666666665, ans=0.0 2023-11-27 00:10:07,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3636053.3333333335, ans=0.0 2023-11-27 00:10:25,664 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4350, loss[loss=0.06545, simple_loss=0.09164, pruned_loss=0.01247, audio_tagging_loss=0.007156, over 14929.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09167, pruned_loss=0.01244, audio_tagging_loss=0.008372, over 3043872.19 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:10:49,116 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-27 00:10:49,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636320.0, ans=0.1 2023-11-27 00:10:50,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3636320.0, ans=0.125 2023-11-27 00:10:59,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3636386.6666666665, ans=0.125 2023-11-27 00:11:03,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3636386.6666666665, ans=0.125 2023-11-27 00:11:03,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3636386.6666666665, ans=0.125 2023-11-27 00:11:22,382 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4400, loss[loss=0.05453, simple_loss=0.07624, pruned_loss=0.00616, audio_tagging_loss=0.01025, over 13963.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09086, pruned_loss=0.01242, audio_tagging_loss=0.008407, over 3039174.84 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:11:24,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=22.5 2023-11-27 00:11:30,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 9.047e+01 9.734e+01 1.041e+02 1.241e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:11:38,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3636586.6666666665, ans=0.0 2023-11-27 00:11:45,036 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-27 00:12:01,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-11-27 00:12:18,836 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4450, loss[loss=0.05549, simple_loss=0.0686, pruned_loss=0.01062, audio_tagging_loss=0.01057, over 14456.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09029, pruned_loss=0.01218, audio_tagging_loss=0.008381, over 3044985.58 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:12:36,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3636920.0, ans=0.125 2023-11-27 00:12:39,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3636920.0, ans=0.125 2023-11-27 00:12:41,856 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-27 00:13:14,869 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4500, loss[loss=0.07893, simple_loss=0.1089, pruned_loss=0.01615, audio_tagging_loss=0.008338, over 15703.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08986, pruned_loss=0.01205, audio_tagging_loss=0.008392, over 3050131.10 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:13:17,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3637186.6666666665, ans=0.125 2023-11-27 00:13:17,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3637186.6666666665, ans=0.1 2023-11-27 00:13:22,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3637186.6666666665, ans=0.125 2023-11-27 00:13:23,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.728e+01 9.573e+01 1.027e+02 1.215e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 00:13:35,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3637253.3333333335, ans=0.0 2023-11-27 00:13:37,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-27 00:13:44,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3637320.0, ans=0.125 2023-11-27 00:13:48,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=15.0 2023-11-27 00:13:54,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3637386.6666666665, ans=0.0 2023-11-27 00:14:08,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3637453.3333333335, ans=0.125 2023-11-27 00:14:11,576 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4550, loss[loss=0.05369, simple_loss=0.07419, pruned_loss=0.007418, audio_tagging_loss=0.009181, over 14513.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08892, pruned_loss=0.01199, audio_tagging_loss=0.008448, over 3043000.75 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:14:26,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3637586.6666666665, ans=0.0 2023-11-27 00:14:33,596 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-27 00:14:41,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-11-27 00:14:54,429 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:14:58,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3637786.6666666665, ans=0.1 2023-11-27 00:15:07,697 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4600, loss[loss=0.0819, simple_loss=0.1111, pruned_loss=0.0173, audio_tagging_loss=0.009043, over 16186.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08848, pruned_loss=0.01193, audio_tagging_loss=0.008641, over 3040722.82 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:15:10,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3637853.3333333335, ans=0.0 2023-11-27 00:15:11,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3637853.3333333335, ans=0.2 2023-11-27 00:15:15,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.975e+01 9.578e+01 1.039e+02 1.809e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:15:29,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-27 00:15:36,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3637986.6666666665, ans=0.125 2023-11-27 00:15:43,858 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:15:54,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3638120.0, ans=0.0 2023-11-27 00:15:59,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3638120.0, ans=0.1 2023-11-27 00:16:02,963 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4650, loss[loss=0.06728, simple_loss=0.09436, pruned_loss=0.01288, audio_tagging_loss=0.007218, over 14043.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08849, pruned_loss=0.01187, audio_tagging_loss=0.008775, over 3034971.16 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:16:05,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3638186.6666666665, ans=0.125 2023-11-27 00:16:09,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2023-11-27 00:16:10,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3638186.6666666665, ans=0.0 2023-11-27 00:16:10,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3638186.6666666665, ans=0.1 2023-11-27 00:16:18,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3638253.3333333335, ans=0.0 2023-11-27 00:16:25,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-27 00:16:26,554 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-27 00:16:45,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-27 00:16:45,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3638386.6666666665, ans=0.125 2023-11-27 00:16:59,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3638520.0, ans=0.125 2023-11-27 00:16:59,969 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4700, loss[loss=0.06587, simple_loss=0.08478, pruned_loss=0.01411, audio_tagging_loss=0.009373, over 16377.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08914, pruned_loss=0.01204, audio_tagging_loss=0.008815, over 3036334.63 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:17:01,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3638520.0, ans=0.125 2023-11-27 00:17:07,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.156e+01 9.734e+01 1.046e+02 1.264e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:17:15,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.80 vs. limit=5.0 2023-11-27 00:17:21,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-27 00:17:30,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3638653.3333333335, ans=0.0 2023-11-27 00:17:36,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3638720.0, ans=0.125 2023-11-27 00:17:43,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3638720.0, ans=0.125 2023-11-27 00:17:56,618 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4750, loss[loss=0.06585, simple_loss=0.09142, pruned_loss=0.01157, audio_tagging_loss=0.008568, over 15760.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08798, pruned_loss=0.01188, audio_tagging_loss=0.008894, over 3035848.62 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:18:07,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3638920.0, ans=0.125 2023-11-27 00:18:18,633 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-27 00:18:42,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3639120.0, ans=0.2 2023-11-27 00:18:43,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3639120.0, ans=0.125 2023-11-27 00:18:51,448 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4800, loss[loss=0.05861, simple_loss=0.08007, pruned_loss=0.006507, audio_tagging_loss=0.01206, over 14877.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08871, pruned_loss=0.01203, audio_tagging_loss=0.008975, over 3041652.34 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:18:52,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3639186.6666666665, ans=0.2 2023-11-27 00:18:59,442 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 8.803e+01 9.667e+01 1.040e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:19:10,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2023-11-27 00:19:14,545 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-27 00:19:32,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3639386.6666666665, ans=0.1 2023-11-27 00:19:44,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3639453.3333333335, ans=0.0 2023-11-27 00:19:48,912 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4850, loss[loss=0.0661, simple_loss=0.09658, pruned_loss=0.00914, audio_tagging_loss=0.008672, over 15671.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.0882, pruned_loss=0.012, audio_tagging_loss=0.009041, over 3036328.69 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:19:50,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3639520.0, ans=0.2 2023-11-27 00:19:55,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3639520.0, ans=0.125 2023-11-27 00:19:56,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3639520.0, ans=0.2 2023-11-27 00:20:06,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3639586.6666666665, ans=0.2 2023-11-27 00:20:07,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2023-11-27 00:20:10,930 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-27 00:20:11,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3639653.3333333335, ans=0.0 2023-11-27 00:20:27,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3639720.0, ans=0.125 2023-11-27 00:20:33,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-27 00:20:35,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=22.5 2023-11-27 00:20:38,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3639786.6666666665, ans=0.1 2023-11-27 00:20:43,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3639786.6666666665, ans=0.125 2023-11-27 00:20:44,964 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4900, loss[loss=0.07195, simple_loss=0.1056, pruned_loss=0.01106, audio_tagging_loss=0.00809, over 14693.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08914, pruned_loss=0.01212, audio_tagging_loss=0.008989, over 3034357.06 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:20:49,495 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:20:50,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3639853.3333333335, ans=0.05 2023-11-27 00:20:52,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.929e+01 9.407e+01 1.023e+02 1.723e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 00:20:56,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2023-11-27 00:21:06,442 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-27 00:21:15,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3639986.6666666665, ans=0.0 2023-11-27 00:21:28,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3640120.0, ans=0.0 2023-11-27 00:21:34,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640120.0, ans=0.1 2023-11-27 00:21:39,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640186.6666666665, ans=0.1 2023-11-27 00:21:40,255 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4950, loss[loss=0.06681, simple_loss=0.09214, pruned_loss=0.0135, audio_tagging_loss=0.007245, over 14794.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08985, pruned_loss=0.01235, audio_tagging_loss=0.008785, over 3030366.93 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:21:44,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3640186.6666666665, ans=0.0 2023-11-27 00:21:56,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3640253.3333333335, ans=0.1 2023-11-27 00:22:02,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3640320.0, ans=0.05 2023-11-27 00:22:03,002 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-27 00:22:03,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3640320.0, ans=0.2 2023-11-27 00:22:04,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3640320.0, ans=0.125 2023-11-27 00:22:30,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3640453.3333333335, ans=0.125 2023-11-27 00:22:35,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3640520.0, ans=0.0 2023-11-27 00:22:35,935 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5000, loss[loss=0.0781, simple_loss=0.1165, pruned_loss=0.01188, audio_tagging_loss=0.007952, over 15476.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09019, pruned_loss=0.0124, audio_tagging_loss=0.008669, over 3032025.82 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:22:44,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.925e+01 9.606e+01 1.023e+02 1.240e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 00:22:59,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-27 00:23:07,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3640653.3333333335, ans=0.125 2023-11-27 00:23:32,438 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5050, loss[loss=0.07351, simple_loss=0.1034, pruned_loss=0.009368, audio_tagging_loss=0.01246, over 15059.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08957, pruned_loss=0.0121, audio_tagging_loss=0.008668, over 3034315.10 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:23:34,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3640853.3333333335, ans=0.0 2023-11-27 00:23:43,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2023-11-27 00:23:46,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2023-11-27 00:23:54,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-27 00:24:13,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3641053.3333333335, ans=0.125 2023-11-27 00:24:17,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.84 vs. limit=10.0 2023-11-27 00:24:24,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=22.5 2023-11-27 00:24:25,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3641120.0, ans=0.125 2023-11-27 00:24:25,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3641120.0, ans=0.125 2023-11-27 00:24:28,508 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5100, loss[loss=0.05403, simple_loss=0.07658, pruned_loss=0.006339, audio_tagging_loss=0.009399, over 13852.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08895, pruned_loss=0.01195, audio_tagging_loss=0.008687, over 3045389.62 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:24:34,127 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:24:36,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.921e+01 9.596e+01 1.036e+02 1.225e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 00:24:38,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3641253.3333333335, ans=0.125 2023-11-27 00:24:51,074 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-27 00:24:59,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3641320.0, ans=10.0 2023-11-27 00:25:00,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-27 00:25:00,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2023-11-27 00:25:20,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3641453.3333333335, ans=0.0 2023-11-27 00:25:22,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3641453.3333333335, ans=0.025 2023-11-27 00:25:24,928 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5150, loss[loss=0.05372, simple_loss=0.07405, pruned_loss=0.008709, audio_tagging_loss=0.007988, over 15539.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08841, pruned_loss=0.01188, audio_tagging_loss=0.008707, over 3036215.33 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:25:25,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3641520.0, ans=0.0 2023-11-27 00:25:47,978 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-27 00:25:52,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3641653.3333333335, ans=0.5 2023-11-27 00:25:56,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3641653.3333333335, ans=0.125 2023-11-27 00:26:20,888 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5200, loss[loss=0.05319, simple_loss=0.06624, pruned_loss=0.008969, audio_tagging_loss=0.0111, over 13633.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08887, pruned_loss=0.01204, audio_tagging_loss=0.008587, over 3033181.88 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:26:29,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.022e+01 9.726e+01 1.018e+02 1.270e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 00:26:38,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.53 vs. limit=10.0 2023-11-27 00:26:43,156 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-27 00:26:43,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.71 vs. limit=22.5 2023-11-27 00:27:16,516 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5250, loss[loss=0.07078, simple_loss=0.09805, pruned_loss=0.01514, audio_tagging_loss=0.006614, over 16759.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08888, pruned_loss=0.01207, audio_tagging_loss=0.00852, over 3037135.90 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:27:18,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3642186.6666666665, ans=0.125 2023-11-27 00:27:28,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3642253.3333333335, ans=0.0 2023-11-27 00:27:30,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3642253.3333333335, ans=0.125 2023-11-27 00:27:30,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3642253.3333333335, ans=0.125 2023-11-27 00:27:38,971 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-27 00:27:42,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3642320.0, ans=0.2 2023-11-27 00:27:50,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3642386.6666666665, ans=0.125 2023-11-27 00:27:56,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3642386.6666666665, ans=0.1 2023-11-27 00:27:56,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3642386.6666666665, ans=0.2 2023-11-27 00:27:58,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3642386.6666666665, ans=0.0 2023-11-27 00:28:08,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3642453.3333333335, ans=0.125 2023-11-27 00:28:08,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642453.3333333335, ans=0.1 2023-11-27 00:28:11,781 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5300, loss[loss=0.06079, simple_loss=0.07649, pruned_loss=0.01323, audio_tagging_loss=0.009319, over 15294.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08988, pruned_loss=0.01221, audio_tagging_loss=0.008485, over 3036992.84 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:28:21,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.52 vs. limit=10.0 2023-11-27 00:28:22,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.037e+01 9.686e+01 1.067e+02 1.240e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 00:28:29,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=15.0 2023-11-27 00:28:35,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-27 00:28:50,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2023-11-27 00:29:08,578 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5350, loss[loss=0.05885, simple_loss=0.08108, pruned_loss=0.008765, audio_tagging_loss=0.009543, over 15037.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09, pruned_loss=0.0123, audio_tagging_loss=0.008538, over 3035549.65 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:29:11,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3642853.3333333335, ans=0.1 2023-11-27 00:29:31,159 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-27 00:29:31,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2023-11-27 00:29:52,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3643120.0, ans=0.0 2023-11-27 00:30:05,144 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5400, loss[loss=0.05557, simple_loss=0.07368, pruned_loss=0.009093, audio_tagging_loss=0.00964, over 14078.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08953, pruned_loss=0.01224, audio_tagging_loss=0.0086, over 3032237.67 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:30:11,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3643186.6666666665, ans=0.1 2023-11-27 00:30:14,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.994e+01 9.613e+01 1.047e+02 1.327e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 00:30:27,041 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-27 00:30:32,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3643320.0, ans=0.125 2023-11-27 00:30:47,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3643386.6666666665, ans=0.0 2023-11-27 00:30:55,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3643453.3333333335, ans=0.0 2023-11-27 00:31:00,371 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5450, loss[loss=0.06811, simple_loss=0.09411, pruned_loss=0.01416, audio_tagging_loss=0.006904, over 14770.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08942, pruned_loss=0.01211, audio_tagging_loss=0.00856, over 3029218.31 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:31:23,104 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-27 00:31:39,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3643720.0, ans=0.0 2023-11-27 00:31:46,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3643786.6666666665, ans=0.125 2023-11-27 00:31:51,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3643786.6666666665, ans=0.0 2023-11-27 00:31:54,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3643786.6666666665, ans=0.1 2023-11-27 00:31:55,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3643853.3333333335, ans=0.2 2023-11-27 00:31:56,717 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5500, loss[loss=0.06785, simple_loss=0.09293, pruned_loss=0.01, audio_tagging_loss=0.01138, over 15591.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08974, pruned_loss=0.01205, audio_tagging_loss=0.008573, over 3025942.56 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:31:57,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3643853.3333333335, ans=0.1 2023-11-27 00:32:04,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3643853.3333333335, ans=0.2 2023-11-27 00:32:06,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.879e+01 9.698e+01 1.044e+02 1.314e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 00:32:09,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3643920.0, ans=0.125 2023-11-27 00:32:10,458 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:32:12,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3643920.0, ans=0.0 2023-11-27 00:32:16,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3643920.0, ans=0.07 2023-11-27 00:32:19,446 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-27 00:32:47,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3644120.0, ans=0.125 2023-11-27 00:32:50,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3644120.0, ans=0.0 2023-11-27 00:32:53,008 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5550, loss[loss=0.06575, simple_loss=0.08776, pruned_loss=0.01141, audio_tagging_loss=0.01047, over 15904.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08936, pruned_loss=0.01203, audio_tagging_loss=0.008728, over 3023855.55 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:33:02,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3644186.6666666665, ans=0.125 2023-11-27 00:33:03,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3644253.3333333335, ans=0.125 2023-11-27 00:33:07,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3644253.3333333335, ans=0.0 2023-11-27 00:33:07,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3644253.3333333335, ans=0.0 2023-11-27 00:33:15,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-27 00:33:20,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=22.5 2023-11-27 00:33:28,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3644386.6666666665, ans=0.95 2023-11-27 00:33:37,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3644453.3333333335, ans=0.1 2023-11-27 00:33:48,618 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5600, loss[loss=0.06492, simple_loss=0.08625, pruned_loss=0.01068, audio_tagging_loss=0.01111, over 15585.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09025, pruned_loss=0.01215, audio_tagging_loss=0.008755, over 3032839.49 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:33:54,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-27 00:33:58,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.835e+01 9.433e+01 1.028e+02 1.297e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:34:11,088 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-27 00:34:11,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3644653.3333333335, ans=0.0 2023-11-27 00:34:28,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3644720.0, ans=0.0 2023-11-27 00:34:28,953 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:34:30,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3644720.0, ans=0.125 2023-11-27 00:34:35,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3644786.6666666665, ans=0.05 2023-11-27 00:34:44,709 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5650, loss[loss=0.07894, simple_loss=0.1114, pruned_loss=0.01633, audio_tagging_loss=0.006906, over 15383.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09019, pruned_loss=0.01212, audio_tagging_loss=0.00889, over 3038133.00 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:34:46,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-27 00:34:58,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3644920.0, ans=0.95 2023-11-27 00:35:06,495 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-27 00:35:06,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3644986.6666666665, ans=0.0 2023-11-27 00:35:13,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3644986.6666666665, ans=0.125 2023-11-27 00:35:22,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2023-11-27 00:35:27,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3645053.3333333335, ans=0.05 2023-11-27 00:35:28,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2023-11-27 00:35:31,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3645120.0, ans=0.2 2023-11-27 00:35:34,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3645120.0, ans=0.125 2023-11-27 00:35:40,811 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5700, loss[loss=0.05799, simple_loss=0.07325, pruned_loss=0.009106, audio_tagging_loss=0.01226, over 15016.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08939, pruned_loss=0.01207, audio_tagging_loss=0.008949, over 3039241.96 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:35:44,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-27 00:35:50,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 8.853e+01 9.368e+01 1.022e+02 1.504e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 00:36:02,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2023-11-27 00:36:03,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-27 00:36:03,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-27 00:36:06,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3645320.0, ans=0.125 2023-11-27 00:36:20,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-27 00:36:20,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-27 00:36:27,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3645453.3333333335, ans=0.125 2023-11-27 00:36:27,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3645453.3333333335, ans=0.0 2023-11-27 00:36:35,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3645520.0, ans=0.125 2023-11-27 00:36:36,197 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5750, loss[loss=0.05819, simple_loss=0.07969, pruned_loss=0.009079, audio_tagging_loss=0.009267, over 15671.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08953, pruned_loss=0.01202, audio_tagging_loss=0.008857, over 3042951.83 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:36:43,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3645520.0, ans=0.0 2023-11-27 00:36:59,311 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-27 00:37:04,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3645653.3333333335, ans=0.125 2023-11-27 00:37:28,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3645786.6666666665, ans=0.125 2023-11-27 00:37:32,706 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5800, loss[loss=0.05713, simple_loss=0.0734, pruned_loss=0.01207, audio_tagging_loss=0.008359, over 16578.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09027, pruned_loss=0.01216, audio_tagging_loss=0.008662, over 3042293.84 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:37:39,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3645853.3333333335, ans=0.1 2023-11-27 00:37:42,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.951e+01 9.661e+01 1.044e+02 1.253e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 00:37:46,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3645920.0, ans=0.125 2023-11-27 00:37:52,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3645920.0, ans=0.125 2023-11-27 00:37:55,094 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-27 00:37:55,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3645986.6666666665, ans=0.1 2023-11-27 00:38:12,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3646053.3333333335, ans=0.125 2023-11-27 00:38:14,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3646053.3333333335, ans=0.125 2023-11-27 00:38:22,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3646120.0, ans=0.125 2023-11-27 00:38:22,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.03 vs. limit=10.0 2023-11-27 00:38:28,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3646186.6666666665, ans=0.125 2023-11-27 00:38:29,076 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5850, loss[loss=0.05665, simple_loss=0.07804, pruned_loss=0.008245, audio_tagging_loss=0.009383, over 16005.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09016, pruned_loss=0.01216, audio_tagging_loss=0.00857, over 3039173.44 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:38:43,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2023-11-27 00:38:50,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-27 00:38:53,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3646320.0, ans=0.0 2023-11-27 00:39:08,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3646386.6666666665, ans=0.0 2023-11-27 00:39:12,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3646453.3333333335, ans=0.125 2023-11-27 00:39:21,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3646453.3333333335, ans=0.0 2023-11-27 00:39:24,482 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5900, loss[loss=0.07134, simple_loss=0.08135, pruned_loss=0.01959, audio_tagging_loss=0.01108, over 15042.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09059, pruned_loss=0.0122, audio_tagging_loss=0.00849, over 3038309.20 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:39:27,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3646520.0, ans=0.125 2023-11-27 00:39:34,487 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.740e+01 9.357e+01 9.859e+01 1.378e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 00:39:45,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3646586.6666666665, ans=0.1 2023-11-27 00:39:47,247 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-27 00:39:52,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-27 00:40:10,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3646786.6666666665, ans=0.1 2023-11-27 00:40:19,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3646786.6666666665, ans=0.1 2023-11-27 00:40:20,824 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5950, loss[loss=0.07169, simple_loss=0.1004, pruned_loss=0.01331, audio_tagging_loss=0.008182, over 14406.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09055, pruned_loss=0.01225, audio_tagging_loss=0.008548, over 3048929.60 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:40:34,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3646920.0, ans=0.125 2023-11-27 00:40:43,317 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-27 00:41:16,166 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6000, loss[loss=0.08278, simple_loss=0.1185, pruned_loss=0.0154, audio_tagging_loss=0.008144, over 15234.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09069, pruned_loss=0.01227, audio_tagging_loss=0.00855, over 3044123.17 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:41:16,167 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 00:41:48,440 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05759, simple_loss=0.05057, pruned_loss=0.005367, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-27 00:41:48,441 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 00:41:53,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3647186.6666666665, ans=0.1 2023-11-27 00:41:58,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.712e+01 9.506e+01 1.018e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 00:42:03,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3647253.3333333335, ans=0.125 2023-11-27 00:42:08,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=22.5 2023-11-27 00:42:10,639 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-27 00:42:14,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3647320.0, ans=0.125 2023-11-27 00:42:27,953 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:42:44,302 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6050, loss[loss=0.06186, simple_loss=0.09317, pruned_loss=0.007906, audio_tagging_loss=0.007372, over 16294.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08902, pruned_loss=0.01206, audio_tagging_loss=0.008582, over 3038911.38 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:42:49,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3647520.0, ans=0.125 2023-11-27 00:42:55,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-27 00:43:02,028 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:43:02,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-11-27 00:43:06,136 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-27 00:43:17,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3647720.0, ans=0.125 2023-11-27 00:43:21,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3647720.0, ans=0.035 2023-11-27 00:43:36,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3647786.6666666665, ans=0.1 2023-11-27 00:43:40,363 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6100, loss[loss=0.07129, simple_loss=0.09213, pruned_loss=0.01595, audio_tagging_loss=0.009273, over 15468.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08894, pruned_loss=0.01211, audio_tagging_loss=0.008598, over 3044414.87 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:43:45,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3647853.3333333335, ans=0.125 2023-11-27 00:43:49,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.942e+01 9.763e+01 1.039e+02 1.274e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 00:43:59,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3647920.0, ans=0.0 2023-11-27 00:44:00,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3647986.6666666665, ans=0.95 2023-11-27 00:44:01,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-27 00:44:35,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3648186.6666666665, ans=0.125 2023-11-27 00:44:35,851 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6150, loss[loss=0.09067, simple_loss=0.126, pruned_loss=0.02052, audio_tagging_loss=0.007152, over 15600.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09026, pruned_loss=0.01247, audio_tagging_loss=0.008674, over 3046263.38 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:44:47,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3648253.3333333335, ans=0.125 2023-11-27 00:44:58,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-27 00:45:01,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3648320.0, ans=0.0 2023-11-27 00:45:14,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3648386.6666666665, ans=0.0 2023-11-27 00:45:19,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3648453.3333333335, ans=0.0 2023-11-27 00:45:19,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3648453.3333333335, ans=0.0 2023-11-27 00:45:31,500 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6200, loss[loss=0.05909, simple_loss=0.08368, pruned_loss=0.008599, audio_tagging_loss=0.008653, over 13804.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08897, pruned_loss=0.01218, audio_tagging_loss=0.008763, over 3042784.94 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:45:31,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3648520.0, ans=0.125 2023-11-27 00:45:42,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3648586.6666666665, ans=0.02 2023-11-27 00:45:43,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.925e+01 9.447e+01 1.055e+02 1.440e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 00:45:52,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3648586.6666666665, ans=0.125 2023-11-27 00:45:54,302 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-27 00:45:55,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3648653.3333333335, ans=0.125 2023-11-27 00:46:00,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.87 vs. limit=10.0 2023-11-27 00:46:28,169 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6250, loss[loss=0.06225, simple_loss=0.08177, pruned_loss=0.0102, audio_tagging_loss=0.01117, over 14586.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08999, pruned_loss=0.01225, audio_tagging_loss=0.008786, over 3050137.62 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:46:30,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3648853.3333333335, ans=10.0 2023-11-27 00:46:44,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=22.5 2023-11-27 00:46:47,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3648920.0, ans=0.0 2023-11-27 00:46:49,449 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-27 00:46:51,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3648986.6666666665, ans=0.0 2023-11-27 00:46:53,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3648986.6666666665, ans=0.2 2023-11-27 00:47:13,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3649120.0, ans=0.2 2023-11-27 00:47:22,755 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6300, loss[loss=0.07201, simple_loss=0.1023, pruned_loss=0.009328, audio_tagging_loss=0.01154, over 14706.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.0905, pruned_loss=0.01227, audio_tagging_loss=0.008788, over 3052431.57 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:47:33,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.827e+01 9.482e+01 1.035e+02 1.564e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 00:47:39,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3649253.3333333335, ans=0.125 2023-11-27 00:47:45,066 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-27 00:48:02,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3649386.6666666665, ans=0.0 2023-11-27 00:48:05,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3649386.6666666665, ans=0.2 2023-11-27 00:48:17,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3649520.0, ans=0.0 2023-11-27 00:48:18,661 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6350, loss[loss=0.05859, simple_loss=0.07125, pruned_loss=0.01222, audio_tagging_loss=0.01074, over 15373.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08948, pruned_loss=0.01192, audio_tagging_loss=0.00894, over 3049051.97 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:48:37,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2023-11-27 00:48:41,689 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-27 00:48:48,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649653.3333333335, ans=0.1 2023-11-27 00:49:02,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3649786.6666666665, ans=0.1 2023-11-27 00:49:03,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649786.6666666665, ans=0.1 2023-11-27 00:49:06,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3649786.6666666665, ans=0.125 2023-11-27 00:49:15,277 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6400, loss[loss=0.05209, simple_loss=0.0719, pruned_loss=0.006603, audio_tagging_loss=0.009533, over 14626.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08818, pruned_loss=0.01176, audio_tagging_loss=0.009031, over 3039160.64 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:49:26,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.880e+01 9.472e+01 1.045e+02 1.391e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 00:49:37,183 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-27 00:49:53,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3650053.3333333335, ans=0.1 2023-11-27 00:50:10,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3650186.6666666665, ans=0.0 2023-11-27 00:50:11,004 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6450, loss[loss=0.06201, simple_loss=0.08365, pruned_loss=0.009832, audio_tagging_loss=0.01035, over 15417.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08758, pruned_loss=0.01162, audio_tagging_loss=0.009165, over 3037408.33 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:50:17,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-27 00:50:30,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-27 00:50:33,256 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-27 00:50:36,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3650320.0, ans=0.0 2023-11-27 00:50:46,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3650386.6666666665, ans=0.125 2023-11-27 00:50:48,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3650386.6666666665, ans=0.95 2023-11-27 00:50:54,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-11-27 00:50:55,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 00:50:56,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-27 00:51:05,928 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6500, loss[loss=0.06421, simple_loss=0.08576, pruned_loss=0.01225, audio_tagging_loss=0.009078, over 16286.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08782, pruned_loss=0.01164, audio_tagging_loss=0.009052, over 3035424.16 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:51:17,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.951e+01 9.386e+01 1.000e+02 1.193e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-27 00:51:29,053 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-27 00:52:02,831 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6550, loss[loss=0.05385, simple_loss=0.06751, pruned_loss=0.01073, audio_tagging_loss=0.00936, over 15872.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08833, pruned_loss=0.01179, audio_tagging_loss=0.008868, over 3037223.78 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:52:09,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3650853.3333333335, ans=0.1 2023-11-27 00:52:19,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3650920.0, ans=0.125 2023-11-27 00:52:25,144 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-27 00:52:35,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3651053.3333333335, ans=0.0 2023-11-27 00:52:58,297 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6600, loss[loss=0.05217, simple_loss=0.0657, pruned_loss=0.01022, audio_tagging_loss=0.009096, over 15498.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08884, pruned_loss=0.0119, audio_tagging_loss=0.008688, over 3032870.13 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:53:03,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3651186.6666666665, ans=0.125 2023-11-27 00:53:08,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3651253.3333333335, ans=0.125 2023-11-27 00:53:09,340 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.821e+01 9.435e+01 1.031e+02 1.384e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:53:21,056 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-27 00:53:38,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-11-27 00:53:44,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3651453.3333333335, ans=0.2 2023-11-27 00:53:50,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3651453.3333333335, ans=0.125 2023-11-27 00:53:54,135 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6650, loss[loss=0.05903, simple_loss=0.07881, pruned_loss=0.01243, audio_tagging_loss=0.007192, over 15530.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08837, pruned_loss=0.01201, audio_tagging_loss=0.00863, over 3039708.28 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:54:13,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3651586.6666666665, ans=0.125 2023-11-27 00:54:14,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-27 00:54:17,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-27 00:54:17,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3651653.3333333335, ans=0.0 2023-11-27 00:54:23,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-27 00:54:25,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=22.5 2023-11-27 00:54:49,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3651853.3333333335, ans=0.125 2023-11-27 00:54:50,318 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6700, loss[loss=0.04232, simple_loss=0.0568, pruned_loss=0.005442, audio_tagging_loss=0.008472, over 15083.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08818, pruned_loss=0.01191, audio_tagging_loss=0.008537, over 3037528.64 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:55:01,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.865e+01 9.450e+01 1.017e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 00:55:02,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3651920.0, ans=0.125 2023-11-27 00:55:03,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651920.0, ans=0.1 2023-11-27 00:55:04,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2023-11-27 00:55:12,875 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-27 00:55:22,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3652053.3333333335, ans=0.125 2023-11-27 00:55:23,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3652053.3333333335, ans=0.09899494936611666 2023-11-27 00:55:34,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3652120.0, ans=0.0 2023-11-27 00:55:35,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3652120.0, ans=0.125 2023-11-27 00:55:46,478 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6750, loss[loss=0.0566, simple_loss=0.07977, pruned_loss=0.008029, audio_tagging_loss=0.008683, over 13573.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08826, pruned_loss=0.01182, audio_tagging_loss=0.008636, over 3042011.34 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:55:53,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3652186.6666666665, ans=0.1 2023-11-27 00:55:57,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-27 00:55:57,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3652253.3333333335, ans=0.0 2023-11-27 00:56:09,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-27 00:56:30,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3652453.3333333335, ans=0.0 2023-11-27 00:56:31,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3652453.3333333335, ans=0.0 2023-11-27 00:56:42,055 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6800, loss[loss=0.06106, simple_loss=0.08104, pruned_loss=0.01021, audio_tagging_loss=0.01034, over 15774.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08783, pruned_loss=0.01174, audio_tagging_loss=0.008654, over 3046623.79 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:56:48,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3652520.0, ans=12.0 2023-11-27 00:56:53,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.978e+01 9.815e+01 1.051e+02 1.384e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 00:56:58,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3652586.6666666665, ans=12.0 2023-11-27 00:57:04,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2023-11-27 00:57:04,902 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-27 00:57:09,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3652653.3333333335, ans=0.125 2023-11-27 00:57:21,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-27 00:57:38,349 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6850, loss[loss=0.06504, simple_loss=0.0919, pruned_loss=0.01113, audio_tagging_loss=0.007971, over 16054.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08767, pruned_loss=0.01177, audio_tagging_loss=0.008571, over 3047459.73 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:57:44,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3652853.3333333335, ans=0.125 2023-11-27 00:58:00,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-27 00:58:15,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3653053.3333333335, ans=0.125 2023-11-27 00:58:18,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3653053.3333333335, ans=0.5 2023-11-27 00:58:22,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3653120.0, ans=0.125 2023-11-27 00:58:34,327 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6900, loss[loss=0.05229, simple_loss=0.07079, pruned_loss=0.007522, audio_tagging_loss=0.009375, over 14320.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08829, pruned_loss=0.01181, audio_tagging_loss=0.008591, over 3043650.53 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:58:40,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3653186.6666666665, ans=0.2 2023-11-27 00:58:45,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.904e+01 9.598e+01 1.032e+02 1.208e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 00:58:56,294 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-27 00:59:08,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3653386.6666666665, ans=0.0 2023-11-27 00:59:09,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3653386.6666666665, ans=0.0 2023-11-27 00:59:13,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=15.0 2023-11-27 00:59:15,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3653386.6666666665, ans=0.5 2023-11-27 00:59:15,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3653386.6666666665, ans=0.0 2023-11-27 00:59:16,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3653386.6666666665, ans=0.0 2023-11-27 00:59:16,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-27 00:59:19,639 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:59:24,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-11-27 00:59:31,362 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6950, loss[loss=0.04781, simple_loss=0.06465, pruned_loss=0.00664, audio_tagging_loss=0.008846, over 14125.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08813, pruned_loss=0.01178, audio_tagging_loss=0.008573, over 3035416.18 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:59:49,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3653586.6666666665, ans=0.1 2023-11-27 00:59:54,810 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-27 01:00:27,987 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7000, loss[loss=0.0642, simple_loss=0.08763, pruned_loss=0.009816, audio_tagging_loss=0.01057, over 13919.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08807, pruned_loss=0.01174, audio_tagging_loss=0.008563, over 3035923.67 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 01:00:35,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3653853.3333333335, ans=0.125 2023-11-27 01:00:39,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.912e+01 9.354e+01 1.017e+02 1.441e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 01:00:43,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3653920.0, ans=0.125 2023-11-27 01:00:46,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-27 01:00:49,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-27 01:01:23,271 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7050, loss[loss=0.04303, simple_loss=0.05084, pruned_loss=0.007281, audio_tagging_loss=0.01033, over 16468.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08818, pruned_loss=0.01187, audio_tagging_loss=0.008664, over 3037113.03 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:01:29,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3654186.6666666665, ans=0.0 2023-11-27 01:01:34,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3654253.3333333335, ans=0.125 2023-11-27 01:01:37,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-27 01:01:41,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3654253.3333333335, ans=0.125 2023-11-27 01:01:44,561 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-27 01:01:47,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-27 01:01:50,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-27 01:01:54,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-27 01:02:04,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3654386.6666666665, ans=0.2 2023-11-27 01:02:18,133 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7100, loss[loss=0.05852, simple_loss=0.07396, pruned_loss=0.01175, audio_tagging_loss=0.009793, over 15407.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08843, pruned_loss=0.0118, audio_tagging_loss=0.008678, over 3038043.39 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:02:24,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3654520.0, ans=0.125 2023-11-27 01:02:27,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-27 01:02:30,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.909e+01 9.590e+01 1.018e+02 1.394e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 01:02:35,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3654586.6666666665, ans=0.0 2023-11-27 01:02:38,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=12.0 2023-11-27 01:02:40,404 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-27 01:02:42,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3654653.3333333335, ans=0.02 2023-11-27 01:02:58,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3654720.0, ans=0.125 2023-11-27 01:02:59,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3654720.0, ans=0.125 2023-11-27 01:03:12,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3654853.3333333335, ans=0.1 2023-11-27 01:03:13,933 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7150, loss[loss=0.06517, simple_loss=0.08894, pruned_loss=0.01169, audio_tagging_loss=0.009016, over 16882.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08905, pruned_loss=0.01191, audio_tagging_loss=0.008691, over 3040876.30 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:03:15,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3654853.3333333335, ans=0.125 2023-11-27 01:03:32,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3654920.0, ans=0.0 2023-11-27 01:03:35,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3654986.6666666665, ans=0.0 2023-11-27 01:03:36,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-27 01:04:09,608 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7200, loss[loss=0.06578, simple_loss=0.09462, pruned_loss=0.01032, audio_tagging_loss=0.008145, over 15108.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.0884, pruned_loss=0.01167, audio_tagging_loss=0.008703, over 3043057.42 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:04:12,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3655186.6666666665, ans=0.0 2023-11-27 01:04:17,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3655186.6666666665, ans=0.125 2023-11-27 01:04:18,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3655186.6666666665, ans=0.0 2023-11-27 01:04:20,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3655253.3333333335, ans=0.2 2023-11-27 01:04:22,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.112e+01 9.564e+01 1.040e+02 1.454e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:04:25,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-27 01:04:30,998 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-27 01:04:36,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3655320.0, ans=0.125 2023-11-27 01:04:41,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.05 vs. limit=22.5 2023-11-27 01:04:41,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3655386.6666666665, ans=0.1 2023-11-27 01:04:55,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3655453.3333333335, ans=0.125 2023-11-27 01:05:04,729 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7250, loss[loss=0.06429, simple_loss=0.08276, pruned_loss=0.01173, audio_tagging_loss=0.01118, over 15734.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08885, pruned_loss=0.01186, audio_tagging_loss=0.008804, over 3042207.73 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:05:21,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2023-11-27 01:05:27,650 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-27 01:05:27,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3655653.3333333335, ans=0.125 2023-11-27 01:05:38,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3655720.0, ans=0.04949747468305833 2023-11-27 01:05:40,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-11-27 01:05:45,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-27 01:05:47,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3655720.0, ans=0.125 2023-11-27 01:05:48,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3655786.6666666665, ans=0.125 2023-11-27 01:05:59,835 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7300, loss[loss=0.06454, simple_loss=0.0887, pruned_loss=0.01277, audio_tagging_loss=0.007417, over 14450.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08944, pruned_loss=0.01196, audio_tagging_loss=0.008718, over 3048969.48 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:06:08,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3655853.3333333335, ans=0.0 2023-11-27 01:06:14,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-27 01:06:14,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.978e+01 9.664e+01 1.039e+02 1.460e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 01:06:14,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3655920.0, ans=0.0 2023-11-27 01:06:14,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3655920.0, ans=0.125 2023-11-27 01:06:22,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3655986.6666666665, ans=0.0 2023-11-27 01:06:22,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-27 01:06:23,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-27 01:06:36,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3656053.3333333335, ans=0.1 2023-11-27 01:06:40,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3656053.3333333335, ans=0.1 2023-11-27 01:06:41,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3656053.3333333335, ans=0.04949747468305833 2023-11-27 01:06:48,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3656120.0, ans=0.2 2023-11-27 01:06:57,548 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7350, loss[loss=0.07876, simple_loss=0.1111, pruned_loss=0.0174, audio_tagging_loss=0.005813, over 15222.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08938, pruned_loss=0.01214, audio_tagging_loss=0.008603, over 3048521.55 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:06:59,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-11-27 01:07:01,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3656186.6666666665, ans=0.0 2023-11-27 01:07:05,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3656186.6666666665, ans=0.1 2023-11-27 01:07:18,765 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-27 01:07:31,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-11-27 01:07:52,333 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7400, loss[loss=0.04961, simple_loss=0.06792, pruned_loss=0.006087, audio_tagging_loss=0.009558, over 15419.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08976, pruned_loss=0.0121, audio_tagging_loss=0.008472, over 3042200.50 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:08:05,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.855e+01 9.450e+01 1.015e+02 1.303e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 01:08:12,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3656586.6666666665, ans=0.0 2023-11-27 01:08:14,684 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-27 01:08:37,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.99 vs. limit=15.0 2023-11-27 01:08:38,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3656786.6666666665, ans=0.125 2023-11-27 01:08:40,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3656786.6666666665, ans=0.0 2023-11-27 01:08:47,514 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7450, loss[loss=0.08637, simple_loss=0.1195, pruned_loss=0.01938, audio_tagging_loss=0.007254, over 14992.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08958, pruned_loss=0.01213, audio_tagging_loss=0.008452, over 3042509.26 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:09:03,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3656920.0, ans=0.125 2023-11-27 01:09:06,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3656920.0, ans=0.125 2023-11-27 01:09:10,522 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-27 01:09:20,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3657053.3333333335, ans=0.0 2023-11-27 01:09:28,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3657053.3333333335, ans=0.0 2023-11-27 01:09:33,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3657120.0, ans=0.125 2023-11-27 01:09:38,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3657120.0, ans=0.125 2023-11-27 01:09:43,442 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7500, loss[loss=0.0624, simple_loss=0.09394, pruned_loss=0.009775, audio_tagging_loss=0.005653, over 13854.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08991, pruned_loss=0.01211, audio_tagging_loss=0.008383, over 3042835.61 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:09:52,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3657186.6666666665, ans=0.125 2023-11-27 01:09:57,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.963e+01 9.690e+01 1.036e+02 1.410e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 01:10:05,777 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-27 01:10:08,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3657320.0, ans=0.0 2023-11-27 01:10:30,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3657453.3333333335, ans=0.125 2023-11-27 01:10:39,350 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7550, loss[loss=0.05537, simple_loss=0.08164, pruned_loss=0.007364, audio_tagging_loss=0.007181, over 14975.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.0901, pruned_loss=0.01229, audio_tagging_loss=0.008488, over 3045634.08 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:10:55,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3657586.6666666665, ans=0.0 2023-11-27 01:11:01,661 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-27 01:11:02,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3657653.3333333335, ans=0.125 2023-11-27 01:11:17,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3657720.0, ans=0.125 2023-11-27 01:11:34,187 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7600, loss[loss=0.04175, simple_loss=0.05073, pruned_loss=0.005016, audio_tagging_loss=0.01137, over 15447.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08874, pruned_loss=0.01189, audio_tagging_loss=0.008478, over 3041509.49 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:11:35,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2023-11-27 01:11:47,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.781e+01 9.560e+01 1.034e+02 1.331e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 01:11:48,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3657920.0, ans=0.125 2023-11-27 01:11:55,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3657920.0, ans=0.125 2023-11-27 01:11:57,245 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-27 01:12:05,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3657986.6666666665, ans=15.0 2023-11-27 01:12:07,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658053.3333333335, ans=0.1 2023-11-27 01:12:20,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3658120.0, ans=0.125 2023-11-27 01:12:30,374 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7650, loss[loss=0.06384, simple_loss=0.08146, pruned_loss=0.01365, audio_tagging_loss=0.00946, over 15450.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08833, pruned_loss=0.01198, audio_tagging_loss=0.008513, over 3034720.91 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:12:44,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3658253.3333333335, ans=0.125 2023-11-27 01:12:52,766 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-27 01:12:57,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-11-27 01:13:08,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2023-11-27 01:13:26,561 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7700, loss[loss=0.04524, simple_loss=0.05692, pruned_loss=0.004627, audio_tagging_loss=0.01215, over 16248.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08783, pruned_loss=0.01197, audio_tagging_loss=0.00853, over 3038735.91 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:13:40,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.982e+01 9.750e+01 1.038e+02 1.363e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 01:13:48,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-27 01:14:03,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3658720.0, ans=0.125 2023-11-27 01:14:05,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658720.0, ans=0.1 2023-11-27 01:14:11,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3658786.6666666665, ans=0.1 2023-11-27 01:14:21,516 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7750, loss[loss=0.0738, simple_loss=0.09701, pruned_loss=0.01583, audio_tagging_loss=0.009464, over 15049.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.0878, pruned_loss=0.01202, audio_tagging_loss=0.008643, over 3033938.10 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:14:44,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-27 01:14:48,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3658986.6666666665, ans=0.125 2023-11-27 01:14:59,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3659053.3333333335, ans=0.125 2023-11-27 01:15:17,383 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7800, loss[loss=0.05497, simple_loss=0.07124, pruned_loss=0.009715, audio_tagging_loss=0.009639, over 16275.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08834, pruned_loss=0.01213, audio_tagging_loss=0.008585, over 3037760.79 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:15:17,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3659186.6666666665, ans=0.125 2023-11-27 01:15:19,700 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:15:21,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3659186.6666666665, ans=0.0 2023-11-27 01:15:29,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3659253.3333333335, ans=0.125 2023-11-27 01:15:31,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.098e+01 9.034e+01 9.648e+01 1.056e+02 1.237e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 01:15:39,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-27 01:15:45,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3659320.0, ans=0.125 2023-11-27 01:15:46,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-27 01:16:12,937 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7850, loss[loss=0.0552, simple_loss=0.06568, pruned_loss=0.0113, audio_tagging_loss=0.01107, over 13446.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08917, pruned_loss=0.01223, audio_tagging_loss=0.008592, over 3036768.13 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:16:27,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3659586.6666666665, ans=0.0 2023-11-27 01:16:35,362 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-27 01:16:44,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-27 01:16:55,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3659720.0, ans=0.05 2023-11-27 01:16:55,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3659720.0, ans=0.0 2023-11-27 01:16:58,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3659786.6666666665, ans=0.0 2023-11-27 01:17:08,646 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7900, loss[loss=0.07731, simple_loss=0.1062, pruned_loss=0.01515, audio_tagging_loss=0.009084, over 15393.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.0905, pruned_loss=0.01241, audio_tagging_loss=0.008583, over 3046887.52 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:17:20,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-27 01:17:23,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.289e+01 9.929e+01 1.057e+02 1.408e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-27 01:17:31,359 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-27 01:17:52,399 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:18:04,857 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7950, loss[loss=0.06092, simple_loss=0.08162, pruned_loss=0.0106, audio_tagging_loss=0.009509, over 14877.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08987, pruned_loss=0.01227, audio_tagging_loss=0.008721, over 3046667.26 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:18:11,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3660186.6666666665, ans=0.125 2023-11-27 01:18:18,145 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:18:26,795 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-27 01:18:30,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3660320.0, ans=0.125 2023-11-27 01:18:31,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-27 01:18:32,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3660320.0, ans=0.0 2023-11-27 01:18:34,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3660320.0, ans=0.04949747468305833 2023-11-27 01:19:00,889 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8000, loss[loss=0.07139, simple_loss=0.0989, pruned_loss=0.01201, audio_tagging_loss=0.009933, over 16118.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08909, pruned_loss=0.01221, audio_tagging_loss=0.008869, over 3053817.97 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:19:01,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3660520.0, ans=0.125 2023-11-27 01:19:14,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 9.017e+01 9.575e+01 1.027e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 01:19:22,522 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-27 01:19:29,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3660653.3333333335, ans=0.0 2023-11-27 01:19:36,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-27 01:19:39,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3660720.0, ans=0.04949747468305833 2023-11-27 01:19:43,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3660720.0, ans=10.0 2023-11-27 01:19:55,676 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8050, loss[loss=0.07439, simple_loss=0.09819, pruned_loss=0.01395, audio_tagging_loss=0.01135, over 16559.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08779, pruned_loss=0.01195, audio_tagging_loss=0.008958, over 3054048.45 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:19:55,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3660853.3333333335, ans=0.125 2023-11-27 01:20:07,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3660920.0, ans=0.0 2023-11-27 01:20:07,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3660920.0, ans=0.0 2023-11-27 01:20:18,368 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-27 01:20:42,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=12.0 2023-11-27 01:20:51,915 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8100, loss[loss=0.06823, simple_loss=0.1045, pruned_loss=0.01012, audio_tagging_loss=0.005847, over 15552.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08834, pruned_loss=0.01205, audio_tagging_loss=0.008927, over 3051022.81 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:20:52,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3661186.6666666665, ans=0.0 2023-11-27 01:21:07,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.808e+01 9.534e+01 1.042e+02 1.593e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:21:13,722 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-27 01:21:14,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=12.0 2023-11-27 01:21:24,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3661386.6666666665, ans=0.125 2023-11-27 01:21:47,908 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8150, loss[loss=0.05411, simple_loss=0.07268, pruned_loss=0.006375, audio_tagging_loss=0.01139, over 16042.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08865, pruned_loss=0.01202, audio_tagging_loss=0.008807, over 3054533.97 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:21:52,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-27 01:21:53,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3661520.0, ans=0.1 2023-11-27 01:22:07,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3661586.6666666665, ans=0.125 2023-11-27 01:22:09,110 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-27 01:22:13,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3661653.3333333335, ans=0.1 2023-11-27 01:22:13,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2023-11-27 01:22:18,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3661653.3333333335, ans=0.1 2023-11-27 01:22:41,924 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:22:42,946 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8200, loss[loss=0.07645, simple_loss=0.1114, pruned_loss=0.01442, audio_tagging_loss=0.00633, over 15633.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08959, pruned_loss=0.01219, audio_tagging_loss=0.008638, over 3055479.04 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:22:58,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.840e+01 9.434e+01 1.030e+02 1.387e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:23:05,229 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-27 01:23:17,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3662053.3333333335, ans=0.125 2023-11-27 01:23:23,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3662053.3333333335, ans=0.1 2023-11-27 01:23:38,477 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8250, loss[loss=0.05643, simple_loss=0.08121, pruned_loss=0.005782, audio_tagging_loss=0.01004, over 15995.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08883, pruned_loss=0.01203, audio_tagging_loss=0.008682, over 3062203.11 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:23:45,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3662186.6666666665, ans=6.0 2023-11-27 01:23:46,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3662186.6666666665, ans=0.2 2023-11-27 01:24:00,870 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-27 01:24:01,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-11-27 01:24:16,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3662386.6666666665, ans=0.1 2023-11-27 01:24:18,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.65 vs. limit=10.0 2023-11-27 01:24:34,723 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8300, loss[loss=0.04367, simple_loss=0.06102, pruned_loss=0.006106, audio_tagging_loss=0.007052, over 14434.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08953, pruned_loss=0.01203, audio_tagging_loss=0.008645, over 3050288.55 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:24:49,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 9.008e+01 9.718e+01 1.064e+02 1.333e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 01:24:55,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3662653.3333333335, ans=0.1 2023-11-27 01:24:56,128 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-27 01:24:58,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3662653.3333333335, ans=0.125 2023-11-27 01:25:06,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3662653.3333333335, ans=0.1 2023-11-27 01:25:09,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-27 01:25:20,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3662786.6666666665, ans=0.125 2023-11-27 01:25:29,695 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8350, loss[loss=0.07717, simple_loss=0.1029, pruned_loss=0.01784, audio_tagging_loss=0.007885, over 14713.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0896, pruned_loss=0.01212, audio_tagging_loss=0.00863, over 3051419.58 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:25:46,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3662920.0, ans=0.125 2023-11-27 01:25:52,450 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-27 01:26:08,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=15.0 2023-11-27 01:26:23,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-27 01:26:24,608 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8400, loss[loss=0.06334, simple_loss=0.08508, pruned_loss=0.012, audio_tagging_loss=0.0088, over 15364.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08805, pruned_loss=0.01172, audio_tagging_loss=0.008677, over 3045549.03 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:26:34,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3663186.6666666665, ans=0.125 2023-11-27 01:26:39,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3663253.3333333335, ans=0.125 2023-11-27 01:26:42,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.317e+01 1.002e+02 1.221e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 01:26:47,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3663320.0, ans=0.125 2023-11-27 01:26:48,128 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-27 01:26:56,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3663320.0, ans=0.2 2023-11-27 01:27:04,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2023-11-27 01:27:06,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3663386.6666666665, ans=0.125 2023-11-27 01:27:21,400 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8450, loss[loss=0.07625, simple_loss=0.1181, pruned_loss=0.01172, audio_tagging_loss=0.005496, over 15851.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08862, pruned_loss=0.01175, audio_tagging_loss=0.008696, over 3046510.93 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:27:24,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3663520.0, ans=10.0 2023-11-27 01:27:37,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3663586.6666666665, ans=0.125 2023-11-27 01:27:43,105 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-27 01:27:43,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3663653.3333333335, ans=0.2 2023-11-27 01:27:44,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3663653.3333333335, ans=0.125 2023-11-27 01:28:05,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3663786.6666666665, ans=0.125 2023-11-27 01:28:10,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3663786.6666666665, ans=0.0 2023-11-27 01:28:16,780 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8500, loss[loss=0.05886, simple_loss=0.08961, pruned_loss=0.007503, audio_tagging_loss=0.006553, over 15356.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08858, pruned_loss=0.01167, audio_tagging_loss=0.008676, over 3048779.17 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:28:28,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-11-27 01:28:32,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.917e+01 9.803e+01 1.059e+02 2.470e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 01:28:38,826 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-27 01:28:48,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3663986.6666666665, ans=0.1 2023-11-27 01:29:00,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3664120.0, ans=0.2 2023-11-27 01:29:04,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3664120.0, ans=0.125 2023-11-27 01:29:06,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3664120.0, ans=0.125 2023-11-27 01:29:10,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3664186.6666666665, ans=0.2 2023-11-27 01:29:11,612 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8550, loss[loss=0.06185, simple_loss=0.0871, pruned_loss=0.01011, audio_tagging_loss=0.008195, over 15149.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08791, pruned_loss=0.01153, audio_tagging_loss=0.008754, over 3048666.17 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:29:12,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3664186.6666666665, ans=0.0 2023-11-27 01:29:15,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.89 vs. limit=10.0 2023-11-27 01:29:34,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3664320.0, ans=0.125 2023-11-27 01:29:35,123 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-27 01:29:35,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3664320.0, ans=0.0 2023-11-27 01:29:38,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-27 01:29:39,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2023-11-27 01:30:00,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3664453.3333333335, ans=0.125 2023-11-27 01:30:04,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3664453.3333333335, ans=0.05 2023-11-27 01:30:06,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3664453.3333333335, ans=0.125 2023-11-27 01:30:07,866 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8600, loss[loss=0.05619, simple_loss=0.07915, pruned_loss=0.007732, audio_tagging_loss=0.008888, over 14463.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08831, pruned_loss=0.01159, audio_tagging_loss=0.008767, over 3050016.46 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:30:15,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3664520.0, ans=0.125 2023-11-27 01:30:24,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.820e+01 9.467e+01 9.988e+01 1.186e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:30:29,568 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-27 01:30:31,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3664653.3333333335, ans=0.125 2023-11-27 01:30:35,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3664653.3333333335, ans=0.0 2023-11-27 01:31:03,586 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8650, loss[loss=0.08988, simple_loss=0.124, pruned_loss=0.01978, audio_tagging_loss=0.008104, over 16537.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08909, pruned_loss=0.01172, audio_tagging_loss=0.00878, over 3049352.22 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:31:03,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3664853.3333333335, ans=0.0 2023-11-27 01:31:05,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3664853.3333333335, ans=0.1 2023-11-27 01:31:18,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3664920.0, ans=0.07 2023-11-27 01:31:23,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3664920.0, ans=0.0 2023-11-27 01:31:25,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3664986.6666666665, ans=0.0 2023-11-27 01:31:25,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3664986.6666666665, ans=0.125 2023-11-27 01:31:26,038 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-27 01:31:37,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2023-11-27 01:31:38,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3665053.3333333335, ans=0.125 2023-11-27 01:31:44,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-11-27 01:31:54,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3665120.0, ans=0.05 2023-11-27 01:31:58,570 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8700, loss[loss=0.0552, simple_loss=0.07486, pruned_loss=0.009614, audio_tagging_loss=0.008153, over 14839.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08895, pruned_loss=0.01157, audio_tagging_loss=0.008794, over 3045467.76 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:31:59,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-11-27 01:32:11,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3665253.3333333335, ans=0.125 2023-11-27 01:32:15,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.944e+01 9.069e+01 9.762e+01 1.053e+02 1.470e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 01:32:21,542 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-27 01:32:22,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3665320.0, ans=0.04949747468305833 2023-11-27 01:32:31,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.27 vs. limit=10.0 2023-11-27 01:32:43,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3665453.3333333335, ans=0.125 2023-11-27 01:32:55,288 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8750, loss[loss=0.05586, simple_loss=0.07561, pruned_loss=0.008595, audio_tagging_loss=0.009459, over 14854.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08891, pruned_loss=0.01164, audio_tagging_loss=0.008856, over 3044765.82 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:33:03,469 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:33:16,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2023-11-27 01:33:17,389 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-27 01:33:24,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3665653.3333333335, ans=0.125 2023-11-27 01:33:33,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-11-27 01:33:50,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2023-11-27 01:33:50,724 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8800, loss[loss=0.05729, simple_loss=0.07655, pruned_loss=0.009678, audio_tagging_loss=0.009339, over 15693.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09031, pruned_loss=0.01187, audio_tagging_loss=0.008871, over 3056720.89 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:34:05,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3665920.0, ans=10.0 2023-11-27 01:34:08,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.927e+01 8.987e+01 9.532e+01 1.025e+02 1.979e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-27 01:34:13,061 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-27 01:34:20,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3665986.6666666665, ans=0.1 2023-11-27 01:34:25,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-27 01:34:31,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-27 01:34:36,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3666120.0, ans=0.0 2023-11-27 01:34:40,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3666120.0, ans=0.125 2023-11-27 01:34:46,290 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8850, loss[loss=0.05262, simple_loss=0.07008, pruned_loss=0.009487, audio_tagging_loss=0.008093, over 14501.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09029, pruned_loss=0.01213, audio_tagging_loss=0.008863, over 3053423.84 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:34:55,320 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:35:04,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3666253.3333333335, ans=0.2 2023-11-27 01:35:07,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.43 vs. limit=22.5 2023-11-27 01:35:09,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-27 01:35:37,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3666453.3333333335, ans=0.0 2023-11-27 01:35:40,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3666453.3333333335, ans=0.0 2023-11-27 01:35:42,759 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8900, loss[loss=0.07587, simple_loss=0.107, pruned_loss=0.01307, audio_tagging_loss=0.00929, over 15649.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09077, pruned_loss=0.01221, audio_tagging_loss=0.008686, over 3056300.54 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:35:43,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3666520.0, ans=0.2 2023-11-27 01:35:49,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3666520.0, ans=0.125 2023-11-27 01:35:56,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3666586.6666666665, ans=0.5 2023-11-27 01:36:00,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.952e+01 9.534e+01 1.026e+02 1.525e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:36:04,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666653.3333333335, ans=0.1 2023-11-27 01:36:05,261 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-27 01:36:07,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3666653.3333333335, ans=15.0 2023-11-27 01:36:08,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3666653.3333333335, ans=0.0 2023-11-27 01:36:18,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-27 01:36:29,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3666786.6666666665, ans=0.0 2023-11-27 01:36:32,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3666786.6666666665, ans=0.125 2023-11-27 01:36:32,772 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:36:38,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3666853.3333333335, ans=0.125 2023-11-27 01:36:38,838 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8950, loss[loss=0.07826, simple_loss=0.1174, pruned_loss=0.01423, audio_tagging_loss=0.005353, over 14701.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09077, pruned_loss=0.01219, audio_tagging_loss=0.008613, over 3055976.20 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:36:40,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3666853.3333333335, ans=0.125 2023-11-27 01:37:00,455 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-27 01:37:05,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-11-27 01:37:16,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3667053.3333333335, ans=0.125 2023-11-27 01:37:34,274 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9000, loss[loss=0.05708, simple_loss=0.07693, pruned_loss=0.01018, audio_tagging_loss=0.00843, over 16191.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09067, pruned_loss=0.01219, audio_tagging_loss=0.008476, over 3056539.06 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:37:34,275 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 01:38:07,098 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05879, simple_loss=0.05049, pruned_loss=0.005306, audio_tagging_loss=0.02824, over 4681554.00 frames. 2023-11-27 01:38:07,098 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 01:38:22,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-27 01:38:24,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-27 01:38:24,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-27 01:38:25,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.928e+01 9.533e+01 1.025e+02 1.320e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:38:27,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3667320.0, ans=0.0 2023-11-27 01:38:29,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-27 01:38:29,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2023-11-27 01:38:30,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-27 01:38:32,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3667320.0, ans=0.0 2023-11-27 01:38:52,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3667453.3333333335, ans=0.0 2023-11-27 01:38:55,630 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:38:59,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3667453.3333333335, ans=0.125 2023-11-27 01:39:02,709 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9050, loss[loss=0.05668, simple_loss=0.08328, pruned_loss=0.007617, audio_tagging_loss=0.00742, over 15087.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.0904, pruned_loss=0.01217, audio_tagging_loss=0.008429, over 3055560.64 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 4.0 2023-11-27 01:39:06,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3667520.0, ans=10.0 2023-11-27 01:39:20,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.04 vs. limit=15.0 2023-11-27 01:39:23,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3667653.3333333335, ans=0.0 2023-11-27 01:39:24,572 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-27 01:39:33,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3667653.3333333335, ans=0.125 2023-11-27 01:39:44,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3667720.0, ans=0.125 2023-11-27 01:39:58,466 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9100, loss[loss=0.07573, simple_loss=0.1066, pruned_loss=0.01269, audio_tagging_loss=0.009745, over 15245.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0903, pruned_loss=0.01213, audio_tagging_loss=0.00848, over 3059645.72 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:40:03,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2023-11-27 01:40:06,135 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:40:12,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3667920.0, ans=0.0 2023-11-27 01:40:19,179 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.136e+01 9.567e+01 1.016e+02 1.322e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:40:21,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-27 01:40:36,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3668053.3333333335, ans=0.125 2023-11-27 01:40:46,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3668120.0, ans=0.2 2023-11-27 01:40:51,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.69 vs. limit=22.5 2023-11-27 01:40:54,725 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9150, loss[loss=0.06384, simple_loss=0.09125, pruned_loss=0.009972, audio_tagging_loss=0.008246, over 14530.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08976, pruned_loss=0.01204, audio_tagging_loss=0.008438, over 3061405.39 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:41:15,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-27 01:41:16,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-27 01:41:29,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3668386.6666666665, ans=0.125 2023-11-27 01:41:30,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=22.5 2023-11-27 01:41:35,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3668386.6666666665, ans=0.0 2023-11-27 01:41:43,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3668453.3333333335, ans=0.125 2023-11-27 01:41:44,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3668453.3333333335, ans=0.125 2023-11-27 01:41:44,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-27 01:41:47,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3668453.3333333335, ans=0.125 2023-11-27 01:41:50,297 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9200, loss[loss=0.06472, simple_loss=0.09016, pruned_loss=0.0109, audio_tagging_loss=0.008736, over 15060.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08963, pruned_loss=0.01205, audio_tagging_loss=0.008478, over 3058770.22 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:42:09,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.971e+01 9.683e+01 1.056e+02 2.334e+02, threshold=1.937e+02, percent-clipped=1.0 2023-11-27 01:42:12,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-27 01:42:15,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3668653.3333333335, ans=0.125 2023-11-27 01:42:24,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2023-11-27 01:42:40,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3668786.6666666665, ans=0.2 2023-11-27 01:42:42,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3668786.6666666665, ans=0.0 2023-11-27 01:42:45,342 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9250, loss[loss=0.05547, simple_loss=0.06651, pruned_loss=0.00774, audio_tagging_loss=0.01447, over 14115.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08915, pruned_loss=0.01198, audio_tagging_loss=0.008449, over 3054990.03 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:42:46,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3668853.3333333335, ans=0.0 2023-11-27 01:43:03,074 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:43:06,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3668920.0, ans=0.0 2023-11-27 01:43:08,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-27 01:43:09,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3668986.6666666665, ans=0.95 2023-11-27 01:43:17,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-27 01:43:28,139 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:43:30,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3669120.0, ans=0.0 2023-11-27 01:43:33,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-27 01:43:34,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3669120.0, ans=0.125 2023-11-27 01:43:41,737 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9300, loss[loss=0.07367, simple_loss=0.1035, pruned_loss=0.01258, audio_tagging_loss=0.009328, over 16448.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08891, pruned_loss=0.01177, audio_tagging_loss=0.008468, over 3056363.29 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:43:45,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3669186.6666666665, ans=0.125 2023-11-27 01:43:58,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2023-11-27 01:44:01,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 8.933e+01 9.435e+01 1.011e+02 1.310e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:44:04,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-27 01:44:14,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3669386.6666666665, ans=0.125 2023-11-27 01:44:15,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3669386.6666666665, ans=0.0 2023-11-27 01:44:17,110 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:44:22,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-27 01:44:27,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3669453.3333333335, ans=0.2 2023-11-27 01:44:31,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-27 01:44:38,041 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9350, loss[loss=0.05819, simple_loss=0.07074, pruned_loss=0.01044, audio_tagging_loss=0.01239, over 13619.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08939, pruned_loss=0.01206, audio_tagging_loss=0.008475, over 3049657.79 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:44:44,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-27 01:44:59,436 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-27 01:44:59,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3669653.3333333335, ans=0.125 2023-11-27 01:45:06,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3669653.3333333335, ans=0.125 2023-11-27 01:45:13,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3669720.0, ans=0.125 2023-11-27 01:45:14,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-27 01:45:28,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3669786.6666666665, ans=0.1 2023-11-27 01:45:29,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3669786.6666666665, ans=0.0 2023-11-27 01:45:32,399 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:45:33,198 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9400, loss[loss=0.07071, simple_loss=0.09213, pruned_loss=0.01596, audio_tagging_loss=0.008696, over 15345.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08869, pruned_loss=0.01202, audio_tagging_loss=0.008599, over 3053975.31 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:45:35,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2023-11-27 01:45:38,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=3669853.3333333335, ans=12.0 2023-11-27 01:45:54,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.852e+01 9.637e+01 1.052e+02 1.350e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 01:45:56,317 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-27 01:45:56,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3669986.6666666665, ans=0.04949747468305833 2023-11-27 01:46:11,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3670053.3333333335, ans=0.1 2023-11-27 01:46:12,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3670053.3333333335, ans=0.125 2023-11-27 01:46:15,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3670053.3333333335, ans=0.125 2023-11-27 01:46:25,299 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:46:29,034 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9450, loss[loss=0.07011, simple_loss=0.09696, pruned_loss=0.01534, audio_tagging_loss=0.006291, over 15630.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08926, pruned_loss=0.01211, audio_tagging_loss=0.008664, over 3054472.25 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:46:37,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3670186.6666666665, ans=0.1 2023-11-27 01:46:48,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3670253.3333333335, ans=0.2 2023-11-27 01:46:51,968 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-27 01:46:52,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3670320.0, ans=0.125 2023-11-27 01:46:57,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2023-11-27 01:47:02,562 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:47:25,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3670520.0, ans=15.0 2023-11-27 01:47:25,723 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9500, loss[loss=0.07151, simple_loss=0.08953, pruned_loss=0.01579, audio_tagging_loss=0.01096, over 14369.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09013, pruned_loss=0.01226, audio_tagging_loss=0.008714, over 3057677.14 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:47:38,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3670586.6666666665, ans=15.0 2023-11-27 01:47:44,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.026e+01 9.482e+01 1.013e+02 1.263e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 01:47:44,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-27 01:47:46,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2023-11-27 01:47:46,844 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-27 01:48:04,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3670720.0, ans=0.0 2023-11-27 01:48:05,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.90 vs. limit=10.0 2023-11-27 01:48:20,645 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9550, loss[loss=0.06371, simple_loss=0.08067, pruned_loss=0.01031, audio_tagging_loss=0.01306, over 14895.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0903, pruned_loss=0.01222, audio_tagging_loss=0.008833, over 3052298.93 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:48:28,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3670853.3333333335, ans=0.07 2023-11-27 01:48:43,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-27 01:49:00,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3671053.3333333335, ans=0.1 2023-11-27 01:49:15,920 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9600, loss[loss=0.06015, simple_loss=0.09177, pruned_loss=0.008858, audio_tagging_loss=0.005409, over 16132.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08974, pruned_loss=0.01217, audio_tagging_loss=0.00884, over 3045709.23 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:49:37,141 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.787e+01 9.468e+01 1.030e+02 1.227e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 01:49:39,347 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-27 01:49:40,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-27 01:50:09,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3671453.3333333335, ans=0.125 2023-11-27 01:50:12,744 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9650, loss[loss=0.05286, simple_loss=0.06292, pruned_loss=0.01171, audio_tagging_loss=0.009689, over 15000.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08924, pruned_loss=0.01205, audio_tagging_loss=0.008867, over 3050395.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:50:22,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-11-27 01:50:34,456 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-27 01:50:35,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3671653.3333333335, ans=0.0 2023-11-27 01:50:46,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-27 01:51:08,168 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9700, loss[loss=0.0668, simple_loss=0.09229, pruned_loss=0.01081, audio_tagging_loss=0.009837, over 15677.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08951, pruned_loss=0.01206, audio_tagging_loss=0.008773, over 3049608.21 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:51:28,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.991e+01 9.696e+01 1.056e+02 1.366e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 01:51:30,117 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-27 01:51:42,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3672053.3333333335, ans=0.0 2023-11-27 01:51:46,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3672053.3333333335, ans=0.125 2023-11-27 01:52:03,768 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9750, loss[loss=0.06445, simple_loss=0.09159, pruned_loss=0.01062, audio_tagging_loss=0.008031, over 15665.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08919, pruned_loss=0.01191, audio_tagging_loss=0.008618, over 3044888.61 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:52:07,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3672186.6666666665, ans=0.0 2023-11-27 01:52:14,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-27 01:52:21,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-27 01:52:27,141 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-27 01:52:44,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3672386.6666666665, ans=0.125 2023-11-27 01:52:59,827 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9800, loss[loss=0.0717, simple_loss=0.09563, pruned_loss=0.01485, audio_tagging_loss=0.009034, over 14788.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08894, pruned_loss=0.01208, audio_tagging_loss=0.008614, over 3041804.28 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:53:12,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3672586.6666666665, ans=0.1 2023-11-27 01:53:20,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.968e+01 9.826e+01 1.047e+02 1.265e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-27 01:53:22,069 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-27 01:53:22,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3672653.3333333335, ans=0.125 2023-11-27 01:53:47,917 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:53:55,688 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9850, loss[loss=0.07506, simple_loss=0.1061, pruned_loss=0.0149, audio_tagging_loss=0.007126, over 15579.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08938, pruned_loss=0.01214, audio_tagging_loss=0.008471, over 3047711.05 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:53:56,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3672853.3333333335, ans=0.1 2023-11-27 01:54:03,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3672853.3333333335, ans=0.0 2023-11-27 01:54:08,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.13 vs. limit=10.0 2023-11-27 01:54:16,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-27 01:54:16,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3672986.6666666665, ans=0.0 2023-11-27 01:54:17,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-27 01:54:22,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-27 01:54:41,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=15.0 2023-11-27 01:54:42,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3673120.0, ans=0.2 2023-11-27 01:54:50,666 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9900, loss[loss=0.06365, simple_loss=0.09278, pruned_loss=0.01293, audio_tagging_loss=0.004331, over 14664.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08934, pruned_loss=0.01217, audio_tagging_loss=0.008482, over 3049168.81 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:54:58,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3673186.6666666665, ans=10.0 2023-11-27 01:55:13,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.996e+01 9.617e+01 1.030e+02 1.836e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 01:55:13,728 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-27 01:55:22,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3673320.0, ans=0.125 2023-11-27 01:55:25,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3673386.6666666665, ans=0.0 2023-11-27 01:55:40,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3673453.3333333335, ans=0.125 2023-11-27 01:55:47,253 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9950, loss[loss=0.07566, simple_loss=0.09219, pruned_loss=0.01937, audio_tagging_loss=0.0102, over 14734.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08888, pruned_loss=0.01201, audio_tagging_loss=0.008505, over 3047769.88 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:55:48,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3673520.0, ans=0.1 2023-11-27 01:56:09,646 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-27 01:56:17,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-27 01:56:31,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3673786.6666666665, ans=0.0 2023-11-27 01:56:35,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3673786.6666666665, ans=0.125 2023-11-27 01:56:37,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3673786.6666666665, ans=0.0 2023-11-27 01:56:42,974 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10000, loss[loss=0.06544, simple_loss=0.09448, pruned_loss=0.009579, audio_tagging_loss=0.008625, over 14992.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08794, pruned_loss=0.01194, audio_tagging_loss=0.008563, over 3045717.35 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:56:59,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3673920.0, ans=0.125 2023-11-27 01:57:01,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3673920.0, ans=0.125 2023-11-27 01:57:05,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.920e+01 9.463e+01 1.026e+02 1.255e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:57:05,532 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-27 01:57:10,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3673986.6666666665, ans=0.1 2023-11-27 01:57:16,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3674053.3333333335, ans=0.125 2023-11-27 01:57:19,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3674053.3333333335, ans=0.125 2023-11-27 01:57:38,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2023-11-27 01:57:38,700 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10050, loss[loss=0.06534, simple_loss=0.09135, pruned_loss=0.009882, audio_tagging_loss=0.009785, over 15610.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08812, pruned_loss=0.01183, audio_tagging_loss=0.008604, over 3046557.53 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:57:41,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2023-11-27 01:57:50,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-11-27 01:57:57,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3674253.3333333335, ans=0.035 2023-11-27 01:57:58,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2023-11-27 01:58:01,634 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-27 01:58:10,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3674320.0, ans=0.0 2023-11-27 01:58:34,213 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10100, loss[loss=0.07362, simple_loss=0.1013, pruned_loss=0.01476, audio_tagging_loss=0.008227, over 14878.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08766, pruned_loss=0.01166, audio_tagging_loss=0.008645, over 3051615.54 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:58:57,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.911e+01 9.483e+01 1.012e+02 1.276e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 01:58:57,189 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-27 01:59:02,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3674653.3333333335, ans=0.0 2023-11-27 01:59:06,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=12.0 2023-11-27 01:59:13,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2023-11-27 01:59:15,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3674720.0, ans=0.125 2023-11-27 01:59:18,101 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:59:24,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3674786.6666666665, ans=0.125 2023-11-27 01:59:30,772 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10150, loss[loss=0.04738, simple_loss=0.05231, pruned_loss=0.00779, audio_tagging_loss=0.01344, over 16348.00 frames. ], tot_loss[loss=0.06374, simple_loss=0.0869, pruned_loss=0.01156, audio_tagging_loss=0.008731, over 3046906.03 frames. ], batch size: 65, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:59:43,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3674920.0, ans=0.2 2023-11-27 01:59:44,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3674920.0, ans=0.0 2023-11-27 01:59:45,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3674920.0, ans=0.125 2023-11-27 01:59:50,918 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:59:52,886 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-27 01:59:55,511 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:00:02,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3674986.6666666665, ans=0.125 2023-11-27 02:00:09,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.44 vs. limit=22.5 2023-11-27 02:00:11,745 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:00:12,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-11-27 02:00:26,862 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10200, loss[loss=0.07091, simple_loss=0.09026, pruned_loss=0.01625, audio_tagging_loss=0.009535, over 15863.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08754, pruned_loss=0.01163, audio_tagging_loss=0.00872, over 3045302.78 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:00:28,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3675186.6666666665, ans=0.0 2023-11-27 02:00:34,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3675186.6666666665, ans=0.1 2023-11-27 02:00:45,960 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:00:49,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.864e+01 9.560e+01 1.043e+02 1.445e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:00:49,204 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-27 02:00:59,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3675386.6666666665, ans=0.125 2023-11-27 02:01:04,707 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:01:12,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3675453.3333333335, ans=0.125 2023-11-27 02:01:17,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3675453.3333333335, ans=0.125 2023-11-27 02:01:21,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3675520.0, ans=0.125 2023-11-27 02:01:22,519 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10250, loss[loss=0.0607, simple_loss=0.0791, pruned_loss=0.01267, audio_tagging_loss=0.008472, over 14389.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08769, pruned_loss=0.0117, audio_tagging_loss=0.008749, over 3042698.31 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:01:25,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2023-11-27 02:01:36,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3675586.6666666665, ans=0.1 2023-11-27 02:01:42,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3675586.6666666665, ans=0.125 2023-11-27 02:01:44,919 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-27 02:01:56,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.21 vs. limit=22.5 2023-11-27 02:02:07,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-11-27 02:02:12,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=8.0 2023-11-27 02:02:14,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3675786.6666666665, ans=0.05 2023-11-27 02:02:15,597 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:02:16,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3675786.6666666665, ans=0.2 2023-11-27 02:02:18,466 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10300, loss[loss=0.07117, simple_loss=0.1043, pruned_loss=0.01324, audio_tagging_loss=0.005774, over 14874.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08809, pruned_loss=0.01178, audio_tagging_loss=0.008856, over 3042181.82 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:02:40,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.971e+01 9.641e+01 1.026e+02 1.769e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 02:02:40,308 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-27 02:02:40,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2023-11-27 02:03:06,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2023-11-27 02:03:13,883 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10350, loss[loss=0.07981, simple_loss=0.1103, pruned_loss=0.01728, audio_tagging_loss=0.007363, over 15447.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08833, pruned_loss=0.01185, audio_tagging_loss=0.008915, over 3038415.16 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:03:31,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3676253.3333333335, ans=0.0 2023-11-27 02:03:36,799 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-27 02:03:46,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3676386.6666666665, ans=0.1 2023-11-27 02:03:54,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-27 02:04:09,404 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10400, loss[loss=0.08095, simple_loss=0.106, pruned_loss=0.02044, audio_tagging_loss=0.00749, over 14627.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08915, pruned_loss=0.01199, audio_tagging_loss=0.008922, over 3042009.85 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:04:11,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3676520.0, ans=0.125 2023-11-27 02:04:23,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3676586.6666666665, ans=0.125 2023-11-27 02:04:32,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.109e+01 9.691e+01 1.057e+02 2.130e+02, threshold=1.938e+02, percent-clipped=1.0 2023-11-27 02:04:32,122 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-27 02:04:40,762 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:04:46,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3676720.0, ans=0.125 2023-11-27 02:05:05,161 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10450, loss[loss=0.05058, simple_loss=0.08113, pruned_loss=0.004298, audio_tagging_loss=0.005718, over 14028.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08954, pruned_loss=0.01197, audio_tagging_loss=0.008813, over 3038751.92 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:05:05,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3676853.3333333335, ans=0.125 2023-11-27 02:05:26,704 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-27 02:05:36,504 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:05:39,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3677053.3333333335, ans=0.1 2023-11-27 02:05:49,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3677120.0, ans=0.0 2023-11-27 02:05:59,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3677186.6666666665, ans=0.1 2023-11-27 02:06:00,585 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10500, loss[loss=0.05978, simple_loss=0.07704, pruned_loss=0.01219, audio_tagging_loss=0.009072, over 15310.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0893, pruned_loss=0.01201, audio_tagging_loss=0.008701, over 3039689.40 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:06:08,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2023-11-27 02:06:22,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 9.041e+01 9.594e+01 1.033e+02 2.053e+02, threshold=1.919e+02, percent-clipped=1.0 2023-11-27 02:06:23,058 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-27 02:06:50,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3677453.3333333335, ans=0.0 2023-11-27 02:06:56,000 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10550, loss[loss=0.06143, simple_loss=0.07903, pruned_loss=0.01388, audio_tagging_loss=0.008032, over 14361.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0885, pruned_loss=0.01187, audio_tagging_loss=0.008648, over 3035549.68 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:09,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3677586.6666666665, ans=0.1 2023-11-27 02:07:11,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3677586.6666666665, ans=0.0 2023-11-27 02:07:19,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-11-27 02:07:19,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-27 02:07:22,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2023-11-27 02:07:52,990 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10600, loss[loss=0.05022, simple_loss=0.06595, pruned_loss=0.00919, audio_tagging_loss=0.008052, over 13772.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08915, pruned_loss=0.01193, audio_tagging_loss=0.008514, over 3034752.68 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:58,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3677853.3333333335, ans=0.0 2023-11-27 02:07:59,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3677853.3333333335, ans=0.0 2023-11-27 02:08:14,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.957e+01 9.483e+01 1.042e+02 1.260e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 02:08:14,900 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-27 02:08:18,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3677986.6666666665, ans=0.0 2023-11-27 02:08:19,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-27 02:08:21,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3677986.6666666665, ans=0.1 2023-11-27 02:08:26,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3678053.3333333335, ans=0.1 2023-11-27 02:08:33,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3678053.3333333335, ans=0.0 2023-11-27 02:08:48,505 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10650, loss[loss=0.05768, simple_loss=0.07388, pruned_loss=0.01213, audio_tagging_loss=0.008612, over 15102.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08907, pruned_loss=0.01206, audio_tagging_loss=0.008532, over 3035328.23 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:08:58,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=22.5 2023-11-27 02:08:59,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3678253.3333333335, ans=0.2 2023-11-27 02:09:02,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3678253.3333333335, ans=0.125 2023-11-27 02:09:02,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-27 02:09:10,277 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-27 02:09:36,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-11-27 02:09:41,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3678453.3333333335, ans=0.125 2023-11-27 02:09:43,005 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10700, loss[loss=0.05745, simple_loss=0.07129, pruned_loss=0.01218, audio_tagging_loss=0.009628, over 14458.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08847, pruned_loss=0.01192, audio_tagging_loss=0.008616, over 3036718.67 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:09:48,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3678520.0, ans=0.1 2023-11-27 02:10:06,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-27 02:10:07,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.981e+01 9.456e+01 1.028e+02 1.264e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:10:18,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-11-27 02:10:26,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-27 02:10:40,251 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10750, loss[loss=0.07028, simple_loss=0.09828, pruned_loss=0.01335, audio_tagging_loss=0.007786, over 14911.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08897, pruned_loss=0.01189, audio_tagging_loss=0.00856, over 3039233.36 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:10:48,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3678853.3333333335, ans=0.125 2023-11-27 02:11:01,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-27 02:11:05,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3678986.6666666665, ans=0.125 2023-11-27 02:11:23,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-11-27 02:11:35,276 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10800, loss[loss=0.06795, simple_loss=0.09903, pruned_loss=0.01199, audio_tagging_loss=0.006442, over 15544.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08946, pruned_loss=0.01197, audio_tagging_loss=0.008489, over 3036545.85 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:11:39,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3679186.6666666665, ans=0.125 2023-11-27 02:11:41,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3679186.6666666665, ans=10.0 2023-11-27 02:11:54,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-27 02:11:57,087 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-27 02:11:59,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.776e+01 9.602e+01 1.034e+02 1.420e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 02:11:59,377 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:12:05,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3679320.0, ans=10.0 2023-11-27 02:12:26,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-27 02:12:30,733 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10850, loss[loss=0.07228, simple_loss=0.1028, pruned_loss=0.01557, audio_tagging_loss=0.005288, over 15077.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08834, pruned_loss=0.01182, audio_tagging_loss=0.008582, over 3044103.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:12:49,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3679586.6666666665, ans=0.125 2023-11-27 02:12:54,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-27 02:13:05,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3679720.0, ans=0.2 2023-11-27 02:13:20,986 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:13:26,298 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10900, loss[loss=0.07341, simple_loss=0.09905, pruned_loss=0.01435, audio_tagging_loss=0.009535, over 15067.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08825, pruned_loss=0.01169, audio_tagging_loss=0.008714, over 3050067.41 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:13:33,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3679853.3333333335, ans=0.1 2023-11-27 02:13:46,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3679920.0, ans=0.2 2023-11-27 02:13:49,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-27 02:13:53,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.930e+01 9.586e+01 1.062e+02 1.591e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 02:13:56,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3679986.6666666665, ans=0.0 2023-11-27 02:14:00,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3679986.6666666665, ans=0.02 2023-11-27 02:14:06,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3680053.3333333335, ans=0.125 2023-11-27 02:14:15,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3680120.0, ans=0.125 2023-11-27 02:14:17,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3680120.0, ans=0.125 2023-11-27 02:14:25,456 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10950, loss[loss=0.03572, simple_loss=0.04238, pruned_loss=0.003212, audio_tagging_loss=0.01132, over 14087.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08908, pruned_loss=0.01193, audio_tagging_loss=0.008689, over 3049569.47 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:14:36,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2023-11-27 02:14:38,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-27 02:14:44,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-27 02:14:46,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-27 02:14:48,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-11-27 02:14:50,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:14:51,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3680320.0, ans=0.2 2023-11-27 02:14:55,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:14:56,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:15:06,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3680386.6666666665, ans=0.1 2023-11-27 02:15:11,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3680453.3333333335, ans=0.1 2023-11-27 02:15:14,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3680453.3333333335, ans=0.125 2023-11-27 02:15:16,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-11-27 02:15:16,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-27 02:15:18,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3680453.3333333335, ans=0.0 2023-11-27 02:15:20,770 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11000, loss[loss=0.07955, simple_loss=0.1024, pruned_loss=0.01542, audio_tagging_loss=0.01294, over 15334.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09045, pruned_loss=0.01217, audio_tagging_loss=0.008627, over 3056977.38 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:15:24,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680520.0, ans=0.1 2023-11-27 02:15:26,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3680520.0, ans=0.2 2023-11-27 02:15:27,159 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:15:27,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3680520.0, ans=0.2 2023-11-27 02:15:40,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680586.6666666665, ans=0.1 2023-11-27 02:15:43,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-27 02:15:46,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.927e+01 9.605e+01 1.045e+02 1.330e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 02:16:07,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3680786.6666666665, ans=0.0 2023-11-27 02:16:13,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3680786.6666666665, ans=0.0 2023-11-27 02:16:16,372 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11050, loss[loss=0.0572, simple_loss=0.06948, pruned_loss=0.0111, audio_tagging_loss=0.01136, over 14761.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08996, pruned_loss=0.01228, audio_tagging_loss=0.008778, over 3057063.17 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:16:16,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3680853.3333333335, ans=0.125 2023-11-27 02:16:26,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3680853.3333333335, ans=0.125 2023-11-27 02:16:30,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3680920.0, ans=0.125 2023-11-27 02:16:32,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3680920.0, ans=0.125 2023-11-27 02:16:34,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3680920.0, ans=0.125 2023-11-27 02:16:39,092 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-27 02:16:41,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3680986.6666666665, ans=0.0 2023-11-27 02:16:44,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2023-11-27 02:16:49,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3681053.3333333335, ans=0.95 2023-11-27 02:16:50,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3681053.3333333335, ans=0.035 2023-11-27 02:16:58,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3681053.3333333335, ans=0.05 2023-11-27 02:17:13,209 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11100, loss[loss=0.05375, simple_loss=0.0706, pruned_loss=0.00926, audio_tagging_loss=0.009193, over 15703.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08946, pruned_loss=0.01217, audio_tagging_loss=0.008858, over 3056599.90 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:17:23,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3681253.3333333335, ans=0.125 2023-11-27 02:17:24,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3681253.3333333335, ans=0.0 2023-11-27 02:17:31,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2023-11-27 02:17:34,507 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-27 02:17:36,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.882e+01 9.437e+01 1.044e+02 2.360e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 02:17:39,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3681320.0, ans=0.1 2023-11-27 02:17:49,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3681386.6666666665, ans=0.0 2023-11-27 02:17:50,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3681386.6666666665, ans=0.0 2023-11-27 02:17:53,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3681386.6666666665, ans=0.125 2023-11-27 02:17:57,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.29 vs. limit=10.0 2023-11-27 02:18:00,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-27 02:18:08,114 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11150, loss[loss=0.07117, simple_loss=0.08927, pruned_loss=0.01593, audio_tagging_loss=0.0106, over 15569.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0899, pruned_loss=0.01234, audio_tagging_loss=0.008858, over 3055780.46 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:18:14,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3681520.0, ans=0.05 2023-11-27 02:18:16,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3681520.0, ans=0.0 2023-11-27 02:18:23,772 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:18:24,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3681586.6666666665, ans=0.125 2023-11-27 02:18:30,413 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-27 02:18:30,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3681653.3333333335, ans=0.125 2023-11-27 02:18:39,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3681653.3333333335, ans=0.1 2023-11-27 02:18:45,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3681720.0, ans=0.125 2023-11-27 02:18:47,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3681720.0, ans=0.125 2023-11-27 02:19:01,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3681786.6666666665, ans=0.0 2023-11-27 02:19:01,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=12.0 2023-11-27 02:19:03,615 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11200, loss[loss=0.09244, simple_loss=0.1365, pruned_loss=0.01938, audio_tagging_loss=0.004801, over 15168.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09082, pruned_loss=0.01233, audio_tagging_loss=0.008903, over 3058284.49 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:19:06,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3681853.3333333335, ans=0.125 2023-11-27 02:19:21,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3681920.0, ans=0.1 2023-11-27 02:19:26,597 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-27 02:19:28,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.953e+01 9.456e+01 1.019e+02 1.233e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-27 02:19:51,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3682120.0, ans=0.2 2023-11-27 02:19:52,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3682120.0, ans=0.125 2023-11-27 02:19:59,864 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11250, loss[loss=0.05306, simple_loss=0.05859, pruned_loss=0.01046, audio_tagging_loss=0.0133, over 15702.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09069, pruned_loss=0.01234, audio_tagging_loss=0.008852, over 3057700.29 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:20:15,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3682253.3333333335, ans=0.125 2023-11-27 02:20:21,700 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-27 02:20:54,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3682520.0, ans=0.0 2023-11-27 02:20:55,484 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11300, loss[loss=0.07245, simple_loss=0.1029, pruned_loss=0.01468, audio_tagging_loss=0.0063, over 16579.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09069, pruned_loss=0.01229, audio_tagging_loss=0.008684, over 3062088.08 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:20:58,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.07 vs. limit=15.0 2023-11-27 02:21:05,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3682586.6666666665, ans=0.2 2023-11-27 02:21:11,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3682586.6666666665, ans=0.2 2023-11-27 02:21:17,682 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-27 02:21:18,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-27 02:21:21,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 9.061e+01 9.736e+01 1.047e+02 2.003e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 02:21:33,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3682720.0, ans=0.125 2023-11-27 02:21:50,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3682853.3333333335, ans=0.07 2023-11-27 02:21:50,805 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11350, loss[loss=0.05712, simple_loss=0.074, pruned_loss=0.01124, audio_tagging_loss=0.008872, over 14975.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09054, pruned_loss=0.01227, audio_tagging_loss=0.008527, over 3060327.25 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:22:13,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-27 02:22:20,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2023-11-27 02:22:27,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3683053.3333333335, ans=0.0 2023-11-27 02:22:34,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2023-11-27 02:22:35,467 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:22:46,336 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11400, loss[loss=0.07058, simple_loss=0.1038, pruned_loss=0.01285, audio_tagging_loss=0.005835, over 15242.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09081, pruned_loss=0.01233, audio_tagging_loss=0.008399, over 3050961.21 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:22:57,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3683253.3333333335, ans=0.125 2023-11-27 02:22:58,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3683253.3333333335, ans=0.125 2023-11-27 02:23:04,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3683253.3333333335, ans=0.0 2023-11-27 02:23:07,900 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:23:08,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-27 02:23:11,779 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 9.008e+01 9.574e+01 1.020e+02 1.271e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 02:23:19,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-27 02:23:29,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3683386.6666666665, ans=0.125 2023-11-27 02:23:29,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3683453.3333333335, ans=0.125 2023-11-27 02:23:41,790 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11450, loss[loss=0.08447, simple_loss=0.1189, pruned_loss=0.01905, audio_tagging_loss=0.005946, over 14540.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09055, pruned_loss=0.01231, audio_tagging_loss=0.008416, over 3050193.16 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:23:49,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3683520.0, ans=0.0 2023-11-27 02:24:00,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3683586.6666666665, ans=0.1 2023-11-27 02:24:03,996 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-27 02:24:12,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=22.5 2023-11-27 02:24:23,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3683720.0, ans=0.125 2023-11-27 02:24:37,127 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11500, loss[loss=0.05906, simple_loss=0.08045, pruned_loss=0.01022, audio_tagging_loss=0.008617, over 15788.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08969, pruned_loss=0.01205, audio_tagging_loss=0.008462, over 3054112.38 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:24:42,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3683853.3333333335, ans=0.125 2023-11-27 02:24:45,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3683853.3333333335, ans=0.0 2023-11-27 02:24:47,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3683920.0, ans=0.125 2023-11-27 02:24:52,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3683920.0, ans=0.125 2023-11-27 02:24:59,792 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-27 02:25:03,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.865e+01 9.337e+01 9.934e+01 1.227e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 02:25:04,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-27 02:25:07,757 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:25:09,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3684053.3333333335, ans=0.0 2023-11-27 02:25:11,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-27 02:25:20,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3684120.0, ans=0.05 2023-11-27 02:25:24,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3684120.0, ans=0.0 2023-11-27 02:25:33,363 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11550, loss[loss=0.07156, simple_loss=0.1064, pruned_loss=0.01345, audio_tagging_loss=0.004902, over 16056.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08972, pruned_loss=0.012, audio_tagging_loss=0.008496, over 3048590.63 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:25:55,509 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-27 02:25:57,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3684320.0, ans=0.125 2023-11-27 02:26:05,041 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:26:06,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3684386.6666666665, ans=0.0 2023-11-27 02:26:28,798 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11600, loss[loss=0.07085, simple_loss=0.0986, pruned_loss=0.01224, audio_tagging_loss=0.009313, over 15383.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09078, pruned_loss=0.01219, audio_tagging_loss=0.008466, over 3048381.68 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:26:32,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3684520.0, ans=0.125 2023-11-27 02:26:36,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-27 02:26:46,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3684586.6666666665, ans=0.1 2023-11-27 02:26:50,924 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-27 02:26:53,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.971e+01 9.757e+01 1.054e+02 1.398e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 02:26:59,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3684653.3333333335, ans=0.1 2023-11-27 02:27:08,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3684720.0, ans=0.125 2023-11-27 02:27:14,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3684786.6666666665, ans=0.2 2023-11-27 02:27:15,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3684786.6666666665, ans=0.125 2023-11-27 02:27:24,036 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11650, loss[loss=0.05895, simple_loss=0.07707, pruned_loss=0.01024, audio_tagging_loss=0.01018, over 16444.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09012, pruned_loss=0.01203, audio_tagging_loss=0.00844, over 3042582.10 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:27:28,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3684853.3333333335, ans=0.125 2023-11-27 02:27:29,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3684853.3333333335, ans=0.0 2023-11-27 02:27:35,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3684920.0, ans=0.125 2023-11-27 02:27:41,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=22.5 2023-11-27 02:27:46,617 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-27 02:27:51,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3684986.6666666665, ans=0.1 2023-11-27 02:28:10,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3685120.0, ans=0.0 2023-11-27 02:28:10,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3685120.0, ans=0.0 2023-11-27 02:28:19,274 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11700, loss[loss=0.06426, simple_loss=0.09138, pruned_loss=0.009885, audio_tagging_loss=0.008683, over 14466.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08986, pruned_loss=0.01201, audio_tagging_loss=0.00849, over 3043160.74 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:28:33,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-27 02:28:39,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3685253.3333333335, ans=0.0 2023-11-27 02:28:41,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-27 02:28:41,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685320.0, ans=0.125 2023-11-27 02:28:44,820 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.894e+01 9.455e+01 1.028e+02 1.281e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:28:54,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3685386.6666666665, ans=0.2 2023-11-27 02:28:57,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3685386.6666666665, ans=0.5 2023-11-27 02:29:05,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3685453.3333333335, ans=0.1 2023-11-27 02:29:06,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3685453.3333333335, ans=0.125 2023-11-27 02:29:15,646 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11750, loss[loss=0.04933, simple_loss=0.06266, pruned_loss=0.004718, audio_tagging_loss=0.01329, over 14327.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08891, pruned_loss=0.01189, audio_tagging_loss=0.008612, over 3038061.56 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:29:19,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-27 02:29:21,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3685520.0, ans=0.125 2023-11-27 02:29:21,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3685520.0, ans=0.2 2023-11-27 02:29:28,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-27 02:29:29,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-27 02:29:34,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-27 02:29:38,004 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-27 02:29:39,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-27 02:29:44,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-27 02:30:06,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3685786.6666666665, ans=0.125 2023-11-27 02:30:10,482 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11800, loss[loss=0.04812, simple_loss=0.0593, pruned_loss=0.007149, audio_tagging_loss=0.01132, over 15522.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08874, pruned_loss=0.01204, audio_tagging_loss=0.008637, over 3038823.69 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:30:11,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3685853.3333333335, ans=0.0 2023-11-27 02:30:13,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3685853.3333333335, ans=0.125 2023-11-27 02:30:30,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3685920.0, ans=0.0 2023-11-27 02:30:33,998 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-27 02:30:37,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.095e+01 8.976e+01 9.582e+01 1.019e+02 1.579e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 02:30:40,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3685986.6666666665, ans=0.2 2023-11-27 02:30:41,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.53 vs. limit=8.0 2023-11-27 02:30:42,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685986.6666666665, ans=0.1 2023-11-27 02:31:06,369 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11850, loss[loss=0.06545, simple_loss=0.08575, pruned_loss=0.01422, audio_tagging_loss=0.008356, over 14467.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08951, pruned_loss=0.01217, audio_tagging_loss=0.008699, over 3042128.33 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:31:16,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3686186.6666666665, ans=0.0 2023-11-27 02:31:28,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-27 02:32:02,455 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11900, loss[loss=0.05849, simple_loss=0.07192, pruned_loss=0.01274, audio_tagging_loss=0.00979, over 14293.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08835, pruned_loss=0.01204, audio_tagging_loss=0.00887, over 3043336.82 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:32:13,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-27 02:32:23,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-27 02:32:27,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.694e+01 9.517e+01 1.011e+02 1.462e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 02:32:33,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3686653.3333333335, ans=0.125 2023-11-27 02:32:37,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3686720.0, ans=0.125 2023-11-27 02:32:48,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686786.6666666665, ans=0.1 2023-11-27 02:32:57,282 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11950, loss[loss=0.07115, simple_loss=0.1065, pruned_loss=0.009627, audio_tagging_loss=0.008271, over 15317.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08754, pruned_loss=0.01179, audio_tagging_loss=0.009, over 3048010.21 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:33:01,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3686853.3333333335, ans=0.0 2023-11-27 02:33:20,256 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-27 02:33:28,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3686986.6666666665, ans=0.125 2023-11-27 02:33:39,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-11-27 02:33:49,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2023-11-27 02:33:51,404 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 12000, loss[loss=0.06692, simple_loss=0.1045, pruned_loss=0.00874, audio_tagging_loss=0.00594, over 14890.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08842, pruned_loss=0.01189, audio_tagging_loss=0.008931, over 3045096.39 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:33:51,404 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 02:34:09,038 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8168, 4.9620, 5.0796, 4.9087], device='cuda:1') 2023-11-27 02:34:23,569 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05804, simple_loss=0.0505, pruned_loss=0.005297, audio_tagging_loss=0.02749, over 4681554.00 frames. 2023-11-27 02:34:23,570 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 02:34:26,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3687186.6666666665, ans=0.04949747468305833 2023-11-27 02:34:29,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3687186.6666666665, ans=0.125 2023-11-27 02:34:32,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2023-11-27 02:34:36,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.93 vs. limit=6.0 2023-11-27 02:34:44,532 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-27 02:35:18,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.955e+01 9.759e+01 1.053e+02 1.237e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 02:35:18,871 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 0, loss[loss=0.07736, simple_loss=0.09805, pruned_loss=0.008838, audio_tagging_loss=0.0195, over 15877.00 frames. ], tot_loss[loss=0.07736, simple_loss=0.09805, pruned_loss=0.008838, audio_tagging_loss=0.0195, over 15877.00 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:35:18,871 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 02:35:50,393 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05785, simple_loss=0.05054, pruned_loss=0.005317, audio_tagging_loss=0.02726, over 4681554.00 frames. 2023-11-27 02:35:50,394 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 02:35:57,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3687340.0, ans=0.1 2023-11-27 02:36:26,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2023-11-27 02:36:41,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3687606.6666666665, ans=0.125 2023-11-27 02:36:42,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-27 02:36:45,424 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 50, loss[loss=0.07835, simple_loss=0.09658, pruned_loss=0.01388, audio_tagging_loss=0.01618, over 14801.00 frames. ], tot_loss[loss=0.07396, simple_loss=0.09065, pruned_loss=0.01182, audio_tagging_loss=0.01681, over 690122.51 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:36:46,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3687673.3333333335, ans=0.04949747468305833 2023-11-27 02:36:51,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3687673.3333333335, ans=0.125 2023-11-27 02:37:06,516 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:37:08,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3687806.6666666665, ans=0.05 2023-11-27 02:37:10,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-27 02:37:37,428 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-27 02:37:41,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 9.815e+01 1.050e+02 1.145e+02 1.417e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-27 02:37:41,678 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 100, loss[loss=0.09077, simple_loss=0.1182, pruned_loss=0.0195, audio_tagging_loss=0.01218, over 14714.00 frames. ], tot_loss[loss=0.07432, simple_loss=0.09187, pruned_loss=0.01252, audio_tagging_loss=0.01587, over 1217287.00 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:37:42,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=12.0 2023-11-27 02:37:54,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3688073.3333333335, ans=0.2 2023-11-27 02:37:58,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-27 02:38:02,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3688140.0, ans=0.2 2023-11-27 02:38:05,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3688140.0, ans=22.5 2023-11-27 02:38:08,894 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:38:10,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3688140.0, ans=0.2 2023-11-27 02:38:12,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688140.0, ans=0.1 2023-11-27 02:38:22,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3688206.6666666665, ans=0.0 2023-11-27 02:38:34,157 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-27 02:38:37,325 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 150, loss[loss=0.06889, simple_loss=0.09167, pruned_loss=0.01334, audio_tagging_loss=0.009713, over 14945.00 frames. ], tot_loss[loss=0.0734, simple_loss=0.0935, pruned_loss=0.01272, audio_tagging_loss=0.01392, over 1624146.54 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:38:47,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3688406.6666666665, ans=0.0 2023-11-27 02:38:53,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3688406.6666666665, ans=0.2 2023-11-27 02:38:57,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3688473.3333333335, ans=0.1 2023-11-27 02:39:02,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-27 02:39:29,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-27 02:39:32,410 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 200, loss[loss=0.09145, simple_loss=0.1264, pruned_loss=0.01803, audio_tagging_loss=0.01025, over 16144.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09445, pruned_loss=0.01281, audio_tagging_loss=0.01231, over 1948740.64 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:39:32,601 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:39:34,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.211e+01 9.198e+01 9.713e+01 1.048e+02 1.227e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:40:14,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-27 02:40:19,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688940.0, ans=0.1 2023-11-27 02:40:24,214 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-27 02:40:27,881 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 250, loss[loss=0.08966, simple_loss=0.1312, pruned_loss=0.01922, audio_tagging_loss=0.004843, over 16019.00 frames. ], tot_loss[loss=0.07115, simple_loss=0.09394, pruned_loss=0.01292, audio_tagging_loss=0.01126, over 2193578.02 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:40:33,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3689006.6666666665, ans=0.125 2023-11-27 02:40:38,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3689006.6666666665, ans=0.2 2023-11-27 02:40:44,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3689073.3333333335, ans=0.2 2023-11-27 02:41:00,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3689206.6666666665, ans=0.0 2023-11-27 02:41:09,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3689206.6666666665, ans=0.0 2023-11-27 02:41:14,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2023-11-27 02:41:17,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3689273.3333333335, ans=0.04949747468305833 2023-11-27 02:41:21,356 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-27 02:41:24,704 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 300, loss[loss=0.07642, simple_loss=0.1076, pruned_loss=0.015, audio_tagging_loss=0.00764, over 15619.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09323, pruned_loss=0.01285, audio_tagging_loss=0.01053, over 2376160.74 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:41:26,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.232e+01 1.015e+02 1.128e+02 1.500e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-27 02:41:28,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3689340.0, ans=0.125 2023-11-27 02:41:54,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3689473.3333333335, ans=0.125 2023-11-27 02:42:14,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3689606.6666666665, ans=0.125 2023-11-27 02:42:16,583 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-27 02:42:19,758 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 350, loss[loss=0.09411, simple_loss=0.1294, pruned_loss=0.0204, audio_tagging_loss=0.009007, over 16174.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09285, pruned_loss=0.01273, audio_tagging_loss=0.009999, over 2531971.07 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:42:20,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3689673.3333333335, ans=0.1 2023-11-27 02:42:53,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3689873.3333333335, ans=0.09899494936611666 2023-11-27 02:43:00,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3689873.3333333335, ans=0.125 2023-11-27 02:43:11,779 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-27 02:43:14,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3690006.6666666665, ans=0.0 2023-11-27 02:43:15,466 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 400, loss[loss=0.05896, simple_loss=0.07724, pruned_loss=0.01106, audio_tagging_loss=0.009282, over 14026.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09115, pruned_loss=0.01247, audio_tagging_loss=0.00959, over 2641265.94 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:43:18,097 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.006e+01 8.931e+01 9.402e+01 1.042e+02 1.214e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 02:43:22,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3690006.6666666665, ans=0.125 2023-11-27 02:43:32,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3690073.3333333335, ans=0.125 2023-11-27 02:43:37,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690140.0, ans=0.1 2023-11-27 02:43:37,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3690140.0, ans=0.125 2023-11-27 02:43:38,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3690140.0, ans=0.0 2023-11-27 02:43:42,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690140.0, ans=0.1 2023-11-27 02:44:06,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3690273.3333333335, ans=0.0 2023-11-27 02:44:08,449 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-27 02:44:11,509 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 450, loss[loss=0.06189, simple_loss=0.0761, pruned_loss=0.01019, audio_tagging_loss=0.01365, over 15237.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09013, pruned_loss=0.01226, audio_tagging_loss=0.009387, over 2730360.06 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:44:13,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3690340.0, ans=0.125 2023-11-27 02:44:19,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3690340.0, ans=0.0 2023-11-27 02:44:21,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3690406.6666666665, ans=0.125 2023-11-27 02:44:38,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3690473.3333333335, ans=0.125 2023-11-27 02:45:03,909 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-27 02:45:03,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3690606.6666666665, ans=0.0 2023-11-27 02:45:07,290 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 500, loss[loss=0.05134, simple_loss=0.06555, pruned_loss=0.0102, audio_tagging_loss=0.008366, over 15404.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08989, pruned_loss=0.01216, audio_tagging_loss=0.009205, over 2800663.40 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:45:09,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.932e+01 9.491e+01 1.008e+02 1.797e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 02:45:36,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3690806.6666666665, ans=0.07 2023-11-27 02:45:59,596 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-27 02:45:59,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3690940.0, ans=0.125 2023-11-27 02:46:02,700 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 550, loss[loss=0.06584, simple_loss=0.08628, pruned_loss=0.01301, audio_tagging_loss=0.00969, over 15281.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08994, pruned_loss=0.01234, audio_tagging_loss=0.009166, over 2848168.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:46:18,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.17 vs. limit=6.0 2023-11-27 02:46:40,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3691206.6666666665, ans=0.05 2023-11-27 02:46:42,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3691206.6666666665, ans=0.1 2023-11-27 02:46:45,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3691206.6666666665, ans=0.125 2023-11-27 02:46:51,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3691273.3333333335, ans=0.0 2023-11-27 02:46:55,933 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-27 02:46:59,540 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 600, loss[loss=0.07759, simple_loss=0.1152, pruned_loss=0.01419, audio_tagging_loss=0.005778, over 15477.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08905, pruned_loss=0.01214, audio_tagging_loss=0.009132, over 2890884.53 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:47:01,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.835e+01 9.409e+01 1.013e+02 1.233e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 02:47:11,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3691406.6666666665, ans=0.0 2023-11-27 02:47:17,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3691406.6666666665, ans=0.07 2023-11-27 02:47:19,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3691406.6666666665, ans=0.1 2023-11-27 02:47:28,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3691473.3333333335, ans=0.125 2023-11-27 02:47:34,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3691540.0, ans=0.0 2023-11-27 02:47:42,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3691540.0, ans=0.125 2023-11-27 02:47:51,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-27 02:47:55,220 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 650, loss[loss=0.05512, simple_loss=0.07357, pruned_loss=0.008378, audio_tagging_loss=0.009963, over 15429.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08876, pruned_loss=0.01197, audio_tagging_loss=0.009068, over 2918643.94 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:10,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3691740.0, ans=0.025 2023-11-27 02:48:45,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3691940.0, ans=0.2 2023-11-27 02:48:46,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3691940.0, ans=0.0 2023-11-27 02:48:46,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3691940.0, ans=0.125 2023-11-27 02:48:47,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-27 02:48:51,055 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 700, loss[loss=0.05164, simple_loss=0.06949, pruned_loss=0.007017, audio_tagging_loss=0.009878, over 14459.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08946, pruned_loss=0.01212, audio_tagging_loss=0.009021, over 2945828.03 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:53,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.860e+01 9.509e+01 1.038e+02 1.459e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 02:48:55,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3692006.6666666665, ans=0.0 2023-11-27 02:49:02,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3692073.3333333335, ans=0.1 2023-11-27 02:49:13,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3692140.0, ans=0.125 2023-11-27 02:49:24,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3692206.6666666665, ans=0.04949747468305833 2023-11-27 02:49:40,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=22.5 2023-11-27 02:49:44,254 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-27 02:49:47,904 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 750, loss[loss=0.09285, simple_loss=0.128, pruned_loss=0.02082, audio_tagging_loss=0.008036, over 15553.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08962, pruned_loss=0.01217, audio_tagging_loss=0.008948, over 2972005.57 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:49:52,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3692340.0, ans=0.2 2023-11-27 02:50:07,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3692406.6666666665, ans=0.0 2023-11-27 02:50:16,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3692473.3333333335, ans=0.125 2023-11-27 02:50:16,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3692473.3333333335, ans=0.0 2023-11-27 02:50:40,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-27 02:50:43,108 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 800, loss[loss=0.05884, simple_loss=0.08151, pruned_loss=0.008644, audio_tagging_loss=0.009442, over 15271.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08967, pruned_loss=0.01199, audio_tagging_loss=0.008888, over 2988065.16 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:50:45,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.051e+01 9.571e+01 1.030e+02 1.342e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-27 02:51:07,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3692806.6666666665, ans=0.125 2023-11-27 02:51:23,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3692873.3333333335, ans=0.125 2023-11-27 02:51:35,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-27 02:51:39,032 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 850, loss[loss=0.09462, simple_loss=0.1323, pruned_loss=0.01998, audio_tagging_loss=0.008513, over 16758.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08928, pruned_loss=0.01204, audio_tagging_loss=0.00906, over 2997755.26 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:51:43,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3693006.6666666665, ans=0.0 2023-11-27 02:51:43,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3693006.6666666665, ans=0.2 2023-11-27 02:51:58,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3693073.3333333335, ans=0.07 2023-11-27 02:52:06,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3693140.0, ans=10.0 2023-11-27 02:52:30,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3693273.3333333335, ans=0.125 2023-11-27 02:52:32,229 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-27 02:52:35,566 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 900, loss[loss=0.0677, simple_loss=0.09396, pruned_loss=0.01148, audio_tagging_loss=0.009243, over 16151.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08997, pruned_loss=0.01213, audio_tagging_loss=0.009006, over 3009052.93 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:52:35,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3693340.0, ans=0.2 2023-11-27 02:52:39,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.856e+01 9.562e+01 1.034e+02 1.273e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:52:45,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3693406.6666666665, ans=0.0 2023-11-27 02:52:47,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3693406.6666666665, ans=0.2 2023-11-27 02:52:48,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3693406.6666666665, ans=0.125 2023-11-27 02:53:11,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3693540.0, ans=0.0 2023-11-27 02:53:15,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-27 02:53:17,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3693540.0, ans=10.0 2023-11-27 02:53:18,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3693540.0, ans=0.07 2023-11-27 02:53:25,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693606.6666666665, ans=0.1 2023-11-27 02:53:28,056 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-27 02:53:30,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2023-11-27 02:53:31,192 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 950, loss[loss=0.07301, simple_loss=0.09485, pruned_loss=0.01917, audio_tagging_loss=0.006415, over 15618.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08972, pruned_loss=0.01209, audio_tagging_loss=0.008901, over 3016888.59 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:54:09,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3693873.3333333335, ans=0.125 2023-11-27 02:54:23,168 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-27 02:54:26,302 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1000, loss[loss=0.05642, simple_loss=0.08317, pruned_loss=0.008374, audio_tagging_loss=0.006458, over 15790.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08994, pruned_loss=0.01216, audio_tagging_loss=0.008695, over 3020359.89 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:54:28,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-27 02:54:29,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 9.006e+01 9.495e+01 1.025e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 02:54:38,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3694073.3333333335, ans=0.1 2023-11-27 02:54:39,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3694073.3333333335, ans=0.125 2023-11-27 02:54:39,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3694073.3333333335, ans=0.125 2023-11-27 02:54:48,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3694140.0, ans=0.1 2023-11-27 02:54:49,622 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:54:52,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3694140.0, ans=0.125 2023-11-27 02:55:05,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3694206.6666666665, ans=0.0 2023-11-27 02:55:06,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.76 vs. limit=15.0 2023-11-27 02:55:09,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3694273.3333333335, ans=0.0 2023-11-27 02:55:14,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3694273.3333333335, ans=0.125 2023-11-27 02:55:19,471 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-27 02:55:23,088 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1050, loss[loss=0.05261, simple_loss=0.07134, pruned_loss=0.00929, audio_tagging_loss=0.007644, over 15416.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0899, pruned_loss=0.01218, audio_tagging_loss=0.008554, over 3030314.43 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:55:27,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3694340.0, ans=0.0 2023-11-27 02:55:32,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=22.5 2023-11-27 02:55:53,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3694473.3333333335, ans=0.0 2023-11-27 02:56:07,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3694606.6666666665, ans=0.0 2023-11-27 02:56:13,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3694606.6666666665, ans=0.125 2023-11-27 02:56:15,430 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-27 02:56:18,830 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1100, loss[loss=0.08075, simple_loss=0.1074, pruned_loss=0.01921, audio_tagging_loss=0.00783, over 15463.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08977, pruned_loss=0.01221, audio_tagging_loss=0.008601, over 3031700.05 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:56:19,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3694673.3333333335, ans=0.0 2023-11-27 02:56:19,887 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:56:21,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.965e+01 9.717e+01 1.039e+02 1.284e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:56:24,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3694673.3333333335, ans=0.125 2023-11-27 02:56:38,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3694740.0, ans=22.5 2023-11-27 02:56:47,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3694806.6666666665, ans=0.125 2023-11-27 02:56:56,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-27 02:57:01,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-27 02:57:10,360 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-27 02:57:13,444 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1150, loss[loss=0.06821, simple_loss=0.102, pruned_loss=0.01174, audio_tagging_loss=0.005448, over 15087.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.0888, pruned_loss=0.01203, audio_tagging_loss=0.008596, over 3032211.25 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:57:18,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=8.0 2023-11-27 02:57:37,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695140.0, ans=0.1 2023-11-27 02:57:43,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3695140.0, ans=0.0 2023-11-27 02:57:44,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-27 02:57:50,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3695206.6666666665, ans=0.125 2023-11-27 02:58:05,433 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-27 02:58:08,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3695340.0, ans=0.125 2023-11-27 02:58:09,075 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1200, loss[loss=0.06346, simple_loss=0.08974, pruned_loss=0.007814, audio_tagging_loss=0.01078, over 15707.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08881, pruned_loss=0.01207, audio_tagging_loss=0.008545, over 3031119.96 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:58:09,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=22.5 2023-11-27 02:58:10,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695340.0, ans=0.1 2023-11-27 02:58:12,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 8.950e+01 9.657e+01 1.053e+02 1.302e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 02:58:42,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3695540.0, ans=0.0 2023-11-27 02:58:57,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-27 02:59:02,059 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-27 02:59:05,177 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1250, loss[loss=0.06384, simple_loss=0.08723, pruned_loss=0.01, audio_tagging_loss=0.01023, over 13874.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08886, pruned_loss=0.0121, audio_tagging_loss=0.008577, over 3025312.10 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:59:17,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=12.0 2023-11-27 02:59:19,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3695740.0, ans=0.0 2023-11-27 02:59:21,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-11-27 02:59:27,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3695806.6666666665, ans=0.0 2023-11-27 02:59:36,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3695806.6666666665, ans=0.0 2023-11-27 02:59:51,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3695940.0, ans=0.0 2023-11-27 02:59:56,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695940.0, ans=0.1 2023-11-27 02:59:57,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-27 02:59:58,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3695940.0, ans=0.2 2023-11-27 03:00:01,010 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1300, loss[loss=0.05797, simple_loss=0.06595, pruned_loss=0.0139, audio_tagging_loss=0.0111, over 14828.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08799, pruned_loss=0.01197, audio_tagging_loss=0.008596, over 3028165.23 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:00:04,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.987e+01 9.539e+01 1.033e+02 1.348e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 03:00:20,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3696073.3333333335, ans=0.125 2023-11-27 03:00:28,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3696140.0, ans=0.0 2023-11-27 03:00:35,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-11-27 03:00:47,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3696273.3333333335, ans=0.125 2023-11-27 03:00:48,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3696273.3333333335, ans=0.0 2023-11-27 03:00:50,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-27 03:00:53,274 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-27 03:00:56,945 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1350, loss[loss=0.07013, simple_loss=0.09077, pruned_loss=0.01378, audio_tagging_loss=0.01096, over 15625.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08928, pruned_loss=0.01211, audio_tagging_loss=0.008549, over 3036153.78 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:01:02,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3696340.0, ans=0.1 2023-11-27 03:01:16,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2023-11-27 03:01:19,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=22.5 2023-11-27 03:01:22,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3696473.3333333335, ans=0.125 2023-11-27 03:01:28,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2023-11-27 03:01:35,752 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:01:43,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3696606.6666666665, ans=0.0 2023-11-27 03:01:50,038 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-27 03:01:52,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3696673.3333333335, ans=0.1 2023-11-27 03:01:53,142 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1400, loss[loss=0.07179, simple_loss=0.1004, pruned_loss=0.01363, audio_tagging_loss=0.007968, over 15442.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08862, pruned_loss=0.01194, audio_tagging_loss=0.008602, over 3037851.66 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:01:57,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.796e+01 9.481e+01 1.017e+02 1.266e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 03:01:57,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3696673.3333333335, ans=0.0 2023-11-27 03:01:59,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3696673.3333333335, ans=0.025 2023-11-27 03:02:03,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3696740.0, ans=0.125 2023-11-27 03:02:10,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2023-11-27 03:02:26,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-27 03:02:38,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3696940.0, ans=0.125 2023-11-27 03:02:44,918 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-27 03:02:48,021 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1450, loss[loss=0.07974, simple_loss=0.1137, pruned_loss=0.01558, audio_tagging_loss=0.007314, over 15864.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08975, pruned_loss=0.01214, audio_tagging_loss=0.008561, over 3037825.76 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:02:58,897 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:03:03,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3697073.3333333335, ans=0.125 2023-11-27 03:03:22,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3697206.6666666665, ans=0.2 2023-11-27 03:03:23,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3697206.6666666665, ans=0.1 2023-11-27 03:03:40,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-27 03:03:43,664 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1500, loss[loss=0.06331, simple_loss=0.08924, pruned_loss=0.01265, audio_tagging_loss=0.00604, over 13768.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09026, pruned_loss=0.01215, audio_tagging_loss=0.00859, over 3034071.18 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:03:48,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.023e+01 9.880e+01 1.062e+02 1.307e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-27 03:03:57,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3697406.6666666665, ans=0.125 2023-11-27 03:03:59,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3697406.6666666665, ans=0.0 2023-11-27 03:04:00,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3697406.6666666665, ans=0.1 2023-11-27 03:04:36,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-27 03:04:40,349 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1550, loss[loss=0.05245, simple_loss=0.07022, pruned_loss=0.008688, audio_tagging_loss=0.008651, over 15231.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08919, pruned_loss=0.01201, audio_tagging_loss=0.008704, over 3032637.50 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:04:53,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3697740.0, ans=0.125 2023-11-27 03:04:58,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3697740.0, ans=0.125 2023-11-27 03:05:00,331 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:05:13,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=22.5 2023-11-27 03:05:22,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3697873.3333333335, ans=0.125 2023-11-27 03:05:32,953 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-27 03:05:36,046 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1600, loss[loss=0.06939, simple_loss=0.08857, pruned_loss=0.01433, audio_tagging_loss=0.01078, over 14586.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08902, pruned_loss=0.01202, audio_tagging_loss=0.008784, over 3033085.58 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:05:41,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.979e+01 9.588e+01 1.025e+02 1.510e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 03:05:44,634 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:05:49,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3698073.3333333335, ans=6.0 2023-11-27 03:05:57,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698140.0, ans=0.1 2023-11-27 03:05:58,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3698140.0, ans=0.0 2023-11-27 03:06:02,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3698140.0, ans=0.125 2023-11-27 03:06:02,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3698140.0, ans=0.02 2023-11-27 03:06:27,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-27 03:06:27,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3698273.3333333335, ans=0.125 2023-11-27 03:06:31,091 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1650, loss[loss=0.07215, simple_loss=0.1003, pruned_loss=0.01355, audio_tagging_loss=0.008468, over 15366.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08846, pruned_loss=0.01182, audio_tagging_loss=0.008852, over 3039941.69 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:06:32,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3698340.0, ans=0.025 2023-11-27 03:06:36,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-11-27 03:06:44,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3698406.6666666665, ans=0.125 2023-11-27 03:06:45,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-27 03:06:58,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3698473.3333333335, ans=0.2 2023-11-27 03:07:23,994 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-27 03:07:27,477 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1700, loss[loss=0.07276, simple_loss=0.1064, pruned_loss=0.01243, audio_tagging_loss=0.007136, over 15936.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08943, pruned_loss=0.01201, audio_tagging_loss=0.008799, over 3041357.42 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:07:33,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.918e+01 9.493e+01 1.014e+02 1.179e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 03:07:47,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3698740.0, ans=0.2 2023-11-27 03:07:49,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.76 vs. limit=15.0 2023-11-27 03:08:08,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3698873.3333333335, ans=0.07 2023-11-27 03:08:20,293 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-27 03:08:24,002 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1750, loss[loss=0.08587, simple_loss=0.1207, pruned_loss=0.01994, audio_tagging_loss=0.005601, over 13958.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08897, pruned_loss=0.01181, audio_tagging_loss=0.008761, over 3030918.33 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:09:06,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-11-27 03:09:08,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2023-11-27 03:09:09,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3699273.3333333335, ans=0.125 2023-11-27 03:09:16,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-27 03:09:19,711 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1800, loss[loss=0.07273, simple_loss=0.09294, pruned_loss=0.0151, audio_tagging_loss=0.01116, over 14868.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08975, pruned_loss=0.012, audio_tagging_loss=0.008737, over 3036530.95 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:09:26,755 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.924e+01 9.583e+01 9.926e+01 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:09:31,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2023-11-27 03:09:35,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3699406.6666666665, ans=0.125 2023-11-27 03:09:42,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3699473.3333333335, ans=0.125 2023-11-27 03:10:08,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-11-27 03:10:11,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3699606.6666666665, ans=0.125 2023-11-27 03:10:12,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-27 03:10:16,019 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1850, loss[loss=0.07833, simple_loss=0.1012, pruned_loss=0.01796, audio_tagging_loss=0.009767, over 15185.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08939, pruned_loss=0.01191, audio_tagging_loss=0.008726, over 3035397.88 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:10:20,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-27 03:10:25,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2023-11-27 03:10:38,525 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:10:53,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3699873.3333333335, ans=0.125 2023-11-27 03:11:08,682 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-27 03:11:12,132 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1900, loss[loss=0.04877, simple_loss=0.06542, pruned_loss=0.006936, audio_tagging_loss=0.009125, over 16272.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08985, pruned_loss=0.01197, audio_tagging_loss=0.008606, over 3039165.71 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:11:13,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-27 03:11:16,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-27 03:11:18,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.107e+01 9.832e+01 1.049e+02 1.489e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 03:11:21,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3700006.6666666665, ans=0.02 2023-11-27 03:11:55,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:12:05,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-27 03:12:07,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2023-11-27 03:12:08,238 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1950, loss[loss=0.06123, simple_loss=0.08465, pruned_loss=0.01015, audio_tagging_loss=0.008754, over 15118.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08962, pruned_loss=0.01204, audio_tagging_loss=0.008524, over 3042672.93 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:12:08,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3700340.0, ans=0.0 2023-11-27 03:12:14,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=22.5 2023-11-27 03:12:16,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3700340.0, ans=0.125 2023-11-27 03:12:26,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3700406.6666666665, ans=0.2 2023-11-27 03:12:34,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3700473.3333333335, ans=0.0 2023-11-27 03:12:37,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3700473.3333333335, ans=0.125 2023-11-27 03:12:41,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.35 vs. limit=10.0 2023-11-27 03:12:42,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3700540.0, ans=0.125 2023-11-27 03:13:00,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-27 03:13:04,286 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2000, loss[loss=0.06738, simple_loss=0.08805, pruned_loss=0.01253, audio_tagging_loss=0.01083, over 15424.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08988, pruned_loss=0.01211, audio_tagging_loss=0.008501, over 3044381.96 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:13:05,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3700673.3333333335, ans=0.125 2023-11-27 03:13:11,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.780e+01 9.356e+01 1.007e+02 1.266e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 03:13:11,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3700673.3333333335, ans=0.0 2023-11-27 03:13:19,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2023-11-27 03:13:47,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3700873.3333333335, ans=0.125 2023-11-27 03:13:57,159 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-27 03:13:59,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3701006.6666666665, ans=0.0 2023-11-27 03:14:00,291 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2050, loss[loss=0.05653, simple_loss=0.07685, pruned_loss=0.009834, audio_tagging_loss=0.008267, over 15473.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08891, pruned_loss=0.01197, audio_tagging_loss=0.008588, over 3038447.66 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:14:25,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3701140.0, ans=0.0 2023-11-27 03:14:49,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2023-11-27 03:14:52,361 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-27 03:14:55,683 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2100, loss[loss=0.07448, simple_loss=0.1026, pruned_loss=0.0165, audio_tagging_loss=0.006672, over 16474.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08982, pruned_loss=0.01229, audio_tagging_loss=0.008524, over 3038614.84 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:14:58,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-11-27 03:15:02,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.818e+01 9.814e+01 1.041e+02 1.368e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 03:15:07,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3701406.6666666665, ans=0.125 2023-11-27 03:15:13,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3701406.6666666665, ans=0.0 2023-11-27 03:15:30,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3701540.0, ans=0.0 2023-11-27 03:15:45,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3701606.6666666665, ans=0.1 2023-11-27 03:15:49,101 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-27 03:15:52,281 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2150, loss[loss=0.05734, simple_loss=0.0751, pruned_loss=0.01273, audio_tagging_loss=0.007059, over 14582.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09104, pruned_loss=0.01245, audio_tagging_loss=0.008399, over 3042424.65 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:16:07,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3701740.0, ans=0.125 2023-11-27 03:16:17,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2023-11-27 03:16:24,277 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:16:43,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3701940.0, ans=0.2 2023-11-27 03:16:45,541 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-27 03:16:48,651 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2200, loss[loss=0.07186, simple_loss=0.0994, pruned_loss=0.01224, audio_tagging_loss=0.009923, over 16294.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09081, pruned_loss=0.0124, audio_tagging_loss=0.008426, over 3044276.69 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:16:53,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-11-27 03:16:55,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.053e+01 9.078e+01 9.706e+01 1.033e+02 2.180e+02, threshold=1.941e+02, percent-clipped=1.0 2023-11-27 03:17:07,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3702073.3333333335, ans=0.0 2023-11-27 03:17:20,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3702140.0, ans=0.125 2023-11-27 03:17:27,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3702206.6666666665, ans=0.125 2023-11-27 03:17:33,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-27 03:17:33,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702273.3333333335, ans=0.1 2023-11-27 03:17:36,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-27 03:17:39,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702273.3333333335, ans=0.1 2023-11-27 03:17:40,544 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-27 03:17:43,601 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2250, loss[loss=0.08488, simple_loss=0.1241, pruned_loss=0.01506, audio_tagging_loss=0.007781, over 16404.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09084, pruned_loss=0.01242, audio_tagging_loss=0.008471, over 3043821.55 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:17:49,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3702340.0, ans=0.0 2023-11-27 03:18:03,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3702406.6666666665, ans=0.0 2023-11-27 03:18:04,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2023-11-27 03:18:18,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3702540.0, ans=0.0 2023-11-27 03:18:29,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3702606.6666666665, ans=0.125 2023-11-27 03:18:29,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3702606.6666666665, ans=0.04949747468305833 2023-11-27 03:18:35,775 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-27 03:18:35,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3702606.6666666665, ans=0.0 2023-11-27 03:18:36,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2023-11-27 03:18:39,679 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2300, loss[loss=0.06535, simple_loss=0.09412, pruned_loss=0.0133, audio_tagging_loss=0.004992, over 15300.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09055, pruned_loss=0.01231, audio_tagging_loss=0.00849, over 3043303.99 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:18:40,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3702673.3333333335, ans=0.125 2023-11-27 03:18:46,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.197e+01 9.925e+01 1.066e+02 1.274e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 03:19:01,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3702806.6666666665, ans=0.0 2023-11-27 03:19:07,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3702806.6666666665, ans=0.125 2023-11-27 03:19:12,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3702873.3333333335, ans=0.0 2023-11-27 03:19:27,955 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:19:28,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3702940.0, ans=0.05 2023-11-27 03:19:32,262 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-27 03:19:35,983 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2350, loss[loss=0.07208, simple_loss=0.09663, pruned_loss=0.01596, audio_tagging_loss=0.007807, over 15354.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08995, pruned_loss=0.01215, audio_tagging_loss=0.008642, over 3047592.25 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:19:40,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703006.6666666665, ans=0.1 2023-11-27 03:19:50,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3703073.3333333335, ans=0.0 2023-11-27 03:20:18,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2023-11-27 03:20:23,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3703273.3333333335, ans=0.1 2023-11-27 03:20:27,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-27 03:20:29,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=22.5 2023-11-27 03:20:31,103 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2400, loss[loss=0.06967, simple_loss=0.08709, pruned_loss=0.01613, audio_tagging_loss=0.009997, over 15137.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08962, pruned_loss=0.01214, audio_tagging_loss=0.00875, over 3049602.33 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:20:37,431 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.851e+01 9.612e+01 1.018e+02 1.276e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 03:20:54,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3703473.3333333335, ans=0.125 2023-11-27 03:21:23,246 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-27 03:21:26,365 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2450, loss[loss=0.08383, simple_loss=0.1151, pruned_loss=0.01956, audio_tagging_loss=0.006736, over 14988.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09004, pruned_loss=0.01223, audio_tagging_loss=0.008775, over 3050726.61 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:21:29,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3703673.3333333335, ans=0.125 2023-11-27 03:21:44,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3703740.0, ans=0.2 2023-11-27 03:21:46,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3703740.0, ans=0.2 2023-11-27 03:21:46,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3703740.0, ans=0.125 2023-11-27 03:21:50,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3703806.6666666665, ans=0.125 2023-11-27 03:22:03,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3703873.3333333335, ans=0.5 2023-11-27 03:22:19,700 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-27 03:22:19,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3703940.0, ans=0.2 2023-11-27 03:22:23,114 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2500, loss[loss=0.05343, simple_loss=0.07006, pruned_loss=0.01007, audio_tagging_loss=0.008324, over 14555.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08917, pruned_loss=0.01205, audio_tagging_loss=0.008836, over 3051869.57 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:22:24,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=22.5 2023-11-27 03:22:30,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.086e+01 9.685e+01 1.022e+02 1.331e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 03:22:30,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2023-11-27 03:22:39,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2023-11-27 03:22:55,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3704206.6666666665, ans=0.125 2023-11-27 03:22:57,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3704206.6666666665, ans=0.1 2023-11-27 03:23:00,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3704206.6666666665, ans=0.125 2023-11-27 03:23:01,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3704206.6666666665, ans=0.2 2023-11-27 03:23:02,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3704206.6666666665, ans=0.125 2023-11-27 03:23:15,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-27 03:23:18,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3704340.0, ans=0.1 2023-11-27 03:23:18,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3704340.0, ans=0.0 2023-11-27 03:23:18,859 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2550, loss[loss=0.0821, simple_loss=0.1131, pruned_loss=0.01486, audio_tagging_loss=0.0107, over 15026.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08979, pruned_loss=0.01223, audio_tagging_loss=0.008684, over 3051323.06 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:23:19,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3704340.0, ans=0.2 2023-11-27 03:23:22,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3704340.0, ans=0.0 2023-11-27 03:23:39,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3704473.3333333335, ans=0.125 2023-11-27 03:23:49,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704473.3333333335, ans=0.1 2023-11-27 03:24:08,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-11-27 03:24:10,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3704606.6666666665, ans=0.05 2023-11-27 03:24:10,994 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-27 03:24:14,137 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2600, loss[loss=0.03188, simple_loss=0.03588, pruned_loss=0.002884, audio_tagging_loss=0.01106, over 14870.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.0901, pruned_loss=0.01239, audio_tagging_loss=0.008582, over 3046658.67 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:24:22,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.053e+01 9.535e+01 1.024e+02 1.234e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 03:24:25,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2023-11-27 03:24:54,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3704873.3333333335, ans=0.125 2023-11-27 03:24:59,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3704940.0, ans=0.125 2023-11-27 03:25:07,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-27 03:25:10,363 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2650, loss[loss=0.05904, simple_loss=0.0717, pruned_loss=0.01318, audio_tagging_loss=0.01001, over 14645.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08975, pruned_loss=0.01236, audio_tagging_loss=0.008521, over 3048307.12 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:25:43,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3705206.6666666665, ans=0.05 2023-11-27 03:25:50,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3705206.6666666665, ans=0.0 2023-11-27 03:26:02,926 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-27 03:26:06,305 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2700, loss[loss=0.05725, simple_loss=0.0742, pruned_loss=0.01035, audio_tagging_loss=0.0098, over 16466.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0899, pruned_loss=0.01244, audio_tagging_loss=0.008492, over 3056944.14 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:26:13,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 9.070e+01 9.755e+01 1.047e+02 1.495e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 03:26:42,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3705540.0, ans=0.0 2023-11-27 03:26:58,390 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-27 03:27:01,516 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2750, loss[loss=0.05504, simple_loss=0.07476, pruned_loss=0.007452, audio_tagging_loss=0.01021, over 15971.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08904, pruned_loss=0.01216, audio_tagging_loss=0.008541, over 3050129.97 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:27:03,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3705673.3333333335, ans=0.125 2023-11-27 03:27:04,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-27 03:27:05,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3705673.3333333335, ans=0.07 2023-11-27 03:27:06,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3705673.3333333335, ans=0.025 2023-11-27 03:27:15,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3705740.0, ans=0.125 2023-11-27 03:27:47,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3705940.0, ans=0.0 2023-11-27 03:27:49,887 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:27:52,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3705940.0, ans=0.125 2023-11-27 03:27:52,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3705940.0, ans=0.125 2023-11-27 03:27:52,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-27 03:27:54,247 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-27 03:27:57,997 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2800, loss[loss=0.04782, simple_loss=0.06279, pruned_loss=0.006736, audio_tagging_loss=0.009692, over 15093.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08858, pruned_loss=0.01195, audio_tagging_loss=0.008564, over 3056249.56 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:28:05,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.970e+01 9.604e+01 1.036e+02 1.276e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 03:28:11,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2023-11-27 03:28:20,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3706140.0, ans=0.04949747468305833 2023-11-27 03:28:23,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2023-11-27 03:28:27,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3706140.0, ans=0.0 2023-11-27 03:28:31,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3706206.6666666665, ans=0.05 2023-11-27 03:28:37,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=22.5 2023-11-27 03:28:44,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-11-27 03:28:50,583 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-27 03:28:50,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3706273.3333333335, ans=0.125 2023-11-27 03:28:54,313 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2850, loss[loss=0.05479, simple_loss=0.07339, pruned_loss=0.008778, audio_tagging_loss=0.009315, over 14748.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08893, pruned_loss=0.012, audio_tagging_loss=0.008481, over 3052210.37 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:29:08,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3706406.6666666665, ans=0.2 2023-11-27 03:29:12,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3706406.6666666665, ans=0.07 2023-11-27 03:29:13,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3706406.6666666665, ans=0.0 2023-11-27 03:29:20,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-27 03:29:46,441 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-27 03:29:51,992 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2900, loss[loss=0.04199, simple_loss=0.05806, pruned_loss=0.002578, audio_tagging_loss=0.01038, over 15030.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08845, pruned_loss=0.01194, audio_tagging_loss=0.008612, over 3052105.05 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:29:59,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.856e+01 9.574e+01 1.046e+02 1.351e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:30:09,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3706740.0, ans=0.1 2023-11-27 03:30:13,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3706740.0, ans=0.1 2023-11-27 03:30:30,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3706873.3333333335, ans=0.0 2023-11-27 03:30:35,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3706940.0, ans=10.0 2023-11-27 03:30:40,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3706940.0, ans=0.125 2023-11-27 03:30:42,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3706940.0, ans=0.1 2023-11-27 03:30:44,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-27 03:30:47,481 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2950, loss[loss=0.06534, simple_loss=0.09317, pruned_loss=0.01014, audio_tagging_loss=0.008613, over 15172.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08887, pruned_loss=0.01204, audio_tagging_loss=0.008641, over 3053638.09 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:30:47,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3707006.6666666665, ans=0.0 2023-11-27 03:30:49,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3707006.6666666665, ans=0.125 2023-11-27 03:31:04,742 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:31:14,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3707140.0, ans=0.0 2023-11-27 03:31:37,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3707273.3333333335, ans=0.125 2023-11-27 03:31:40,938 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-27 03:31:44,039 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3000, loss[loss=0.08356, simple_loss=0.1148, pruned_loss=0.01566, audio_tagging_loss=0.01051, over 17083.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09009, pruned_loss=0.01225, audio_tagging_loss=0.008731, over 3055499.49 frames. ], batch size: 64, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:31:44,040 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 03:32:16,614 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05735, simple_loss=0.05053, pruned_loss=0.005352, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-27 03:32:16,615 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 03:32:25,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.179e+01 9.770e+01 1.041e+02 1.490e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 03:32:31,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3707406.6666666665, ans=0.1 2023-11-27 03:32:56,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3707540.0, ans=0.125 2023-11-27 03:33:09,525 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-27 03:33:10,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3707606.6666666665, ans=0.125 2023-11-27 03:33:13,153 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3050, loss[loss=0.07803, simple_loss=0.1109, pruned_loss=0.01682, audio_tagging_loss=0.005771, over 15306.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09107, pruned_loss=0.01226, audio_tagging_loss=0.00864, over 3055189.17 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:33:44,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3707806.6666666665, ans=0.125 2023-11-27 03:33:45,035 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:34:05,882 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-27 03:34:09,363 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3100, loss[loss=0.08649, simple_loss=0.1189, pruned_loss=0.01569, audio_tagging_loss=0.01137, over 17134.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09119, pruned_loss=0.01239, audio_tagging_loss=0.008655, over 3050097.16 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:34:12,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3708006.6666666665, ans=0.125 2023-11-27 03:34:12,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3708006.6666666665, ans=0.0 2023-11-27 03:34:17,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3708006.6666666665, ans=0.02 2023-11-27 03:34:17,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 9.052e+01 9.705e+01 1.059e+02 1.500e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 03:34:32,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3708140.0, ans=0.125 2023-11-27 03:34:36,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=22.5 2023-11-27 03:34:38,218 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:34:45,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3708206.6666666665, ans=0.0 2023-11-27 03:35:01,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-27 03:35:05,241 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3150, loss[loss=0.07563, simple_loss=0.1088, pruned_loss=0.01197, audio_tagging_loss=0.009264, over 14689.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09124, pruned_loss=0.0124, audio_tagging_loss=0.008738, over 3049681.24 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:35:05,451 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:35:29,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3708473.3333333335, ans=0.0 2023-11-27 03:35:40,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3708540.0, ans=0.0 2023-11-27 03:35:58,267 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-27 03:36:01,313 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3200, loss[loss=0.07971, simple_loss=0.1124, pruned_loss=0.01811, audio_tagging_loss=0.005384, over 15362.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09143, pruned_loss=0.01238, audio_tagging_loss=0.008738, over 3053114.75 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:36:10,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.966e+01 9.584e+01 1.017e+02 1.282e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:36:23,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3708806.6666666665, ans=0.125 2023-11-27 03:36:32,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3708806.6666666665, ans=0.0 2023-11-27 03:36:39,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3708873.3333333335, ans=0.035 2023-11-27 03:36:43,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3708873.3333333335, ans=10.0 2023-11-27 03:36:54,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-27 03:36:57,430 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3250, loss[loss=0.07331, simple_loss=0.09912, pruned_loss=0.01551, audio_tagging_loss=0.008236, over 14779.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09114, pruned_loss=0.01237, audio_tagging_loss=0.008816, over 3044495.15 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:03,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-27 03:37:17,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3709073.3333333335, ans=0.125 2023-11-27 03:37:24,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3709140.0, ans=0.125 2023-11-27 03:37:25,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3709140.0, ans=0.0 2023-11-27 03:37:28,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3709140.0, ans=0.2 2023-11-27 03:37:48,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3709273.3333333335, ans=0.125 2023-11-27 03:37:49,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-27 03:37:52,937 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3300, loss[loss=0.07339, simple_loss=0.092, pruned_loss=0.01889, audio_tagging_loss=0.008496, over 13824.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.0914, pruned_loss=0.01232, audio_tagging_loss=0.008847, over 3049600.37 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:53,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3709340.0, ans=0.125 2023-11-27 03:37:55,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3709340.0, ans=0.125 2023-11-27 03:38:02,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.106e+01 9.727e+01 1.041e+02 1.146e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 03:38:05,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3709406.6666666665, ans=0.0 2023-11-27 03:38:12,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3709406.6666666665, ans=0.125 2023-11-27 03:38:21,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3709473.3333333335, ans=0.125 2023-11-27 03:38:34,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3709540.0, ans=0.0 2023-11-27 03:38:38,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.59 vs. limit=15.0 2023-11-27 03:38:45,552 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-27 03:38:47,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3709606.6666666665, ans=0.2 2023-11-27 03:38:49,258 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3350, loss[loss=0.06596, simple_loss=0.09261, pruned_loss=0.01328, audio_tagging_loss=0.006374, over 14908.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09115, pruned_loss=0.01235, audio_tagging_loss=0.008812, over 3052191.15 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:39:02,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3709740.0, ans=0.1 2023-11-27 03:39:25,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.85 vs. limit=10.0 2023-11-27 03:39:26,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2023-11-27 03:39:32,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3709873.3333333335, ans=0.2 2023-11-27 03:39:42,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-27 03:39:45,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3710006.6666666665, ans=0.1 2023-11-27 03:39:45,908 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3400, loss[loss=0.05421, simple_loss=0.08116, pruned_loss=0.007349, audio_tagging_loss=0.006279, over 15472.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08932, pruned_loss=0.01211, audio_tagging_loss=0.008797, over 3052865.39 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:39:55,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 9.012e+01 9.564e+01 1.021e+02 1.293e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 03:39:57,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3710073.3333333335, ans=0.125 2023-11-27 03:39:59,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-27 03:40:02,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-27 03:40:30,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3710273.3333333335, ans=0.125 2023-11-27 03:40:38,004 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-27 03:40:41,079 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3450, loss[loss=0.05975, simple_loss=0.07858, pruned_loss=0.01084, audio_tagging_loss=0.009621, over 13881.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08985, pruned_loss=0.0122, audio_tagging_loss=0.00861, over 3049659.09 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:40:42,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3710340.0, ans=0.125 2023-11-27 03:40:51,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2023-11-27 03:41:15,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-27 03:41:21,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3710540.0, ans=0.125 2023-11-27 03:41:24,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3710606.6666666665, ans=0.5 2023-11-27 03:41:32,657 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-27 03:41:36,605 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3500, loss[loss=0.06966, simple_loss=0.0931, pruned_loss=0.01424, audio_tagging_loss=0.008871, over 14597.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08988, pruned_loss=0.01215, audio_tagging_loss=0.008567, over 3047699.97 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:41:47,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.850e+01 9.448e+01 1.007e+02 1.285e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 03:41:48,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3710740.0, ans=0.1 2023-11-27 03:42:05,881 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:42:07,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2023-11-27 03:42:10,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710873.3333333335, ans=0.1 2023-11-27 03:42:11,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2023-11-27 03:42:12,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3710873.3333333335, ans=0.125 2023-11-27 03:42:16,757 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:42:29,915 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-27 03:42:33,583 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3550, loss[loss=0.06322, simple_loss=0.08553, pruned_loss=0.01064, audio_tagging_loss=0.009814, over 15288.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09005, pruned_loss=0.01211, audio_tagging_loss=0.008528, over 3043236.48 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:43:00,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3711140.0, ans=0.07 2023-11-27 03:43:06,920 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:43:13,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-27 03:43:25,856 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-27 03:43:29,010 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3600, loss[loss=0.0677, simple_loss=0.09198, pruned_loss=0.0141, audio_tagging_loss=0.007602, over 14709.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08923, pruned_loss=0.01193, audio_tagging_loss=0.008481, over 3040559.58 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:43:38,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.614e+01 9.369e+01 1.008e+02 1.433e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 03:43:46,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-27 03:43:55,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2023-11-27 03:44:06,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-27 03:44:13,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3711606.6666666665, ans=0.0 2023-11-27 03:44:20,560 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-27 03:44:23,831 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3650, loss[loss=0.04112, simple_loss=0.05547, pruned_loss=0.005388, audio_tagging_loss=0.007996, over 15040.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08871, pruned_loss=0.01177, audio_tagging_loss=0.008446, over 3042303.20 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:45:03,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3711873.3333333335, ans=0.125 2023-11-27 03:45:10,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-27 03:45:13,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3711940.0, ans=0.0 2023-11-27 03:45:17,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-27 03:45:20,978 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3700, loss[loss=0.07184, simple_loss=0.1009, pruned_loss=0.01304, audio_tagging_loss=0.008371, over 16227.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08959, pruned_loss=0.01191, audio_tagging_loss=0.008422, over 3048213.10 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:45:31,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 9.060e+01 9.619e+01 1.026e+02 1.251e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 03:45:38,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3712073.3333333335, ans=0.2 2023-11-27 03:45:39,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3712073.3333333335, ans=0.2 2023-11-27 03:45:54,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3712206.6666666665, ans=0.125 2023-11-27 03:46:13,702 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-27 03:46:16,766 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3750, loss[loss=0.06616, simple_loss=0.0964, pruned_loss=0.01135, audio_tagging_loss=0.006614, over 14844.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08948, pruned_loss=0.01187, audio_tagging_loss=0.008462, over 3048280.09 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:46:29,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3712406.6666666665, ans=0.1 2023-11-27 03:46:39,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3712473.3333333335, ans=0.2 2023-11-27 03:46:49,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3712540.0, ans=0.125 2023-11-27 03:46:54,971 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:46:55,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3712540.0, ans=0.125 2023-11-27 03:47:03,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3712606.6666666665, ans=0.0 2023-11-27 03:47:05,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3712606.6666666665, ans=0.0 2023-11-27 03:47:08,951 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-27 03:47:12,067 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3800, loss[loss=0.05199, simple_loss=0.075, pruned_loss=0.006855, audio_tagging_loss=0.007635, over 15650.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08953, pruned_loss=0.01186, audio_tagging_loss=0.00847, over 3046851.63 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:47:12,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-11-27 03:47:24,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.007e+01 9.077e+01 9.693e+01 1.049e+02 1.287e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 03:47:57,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3712940.0, ans=0.1 2023-11-27 03:48:05,199 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-27 03:48:08,320 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3850, loss[loss=0.05531, simple_loss=0.07123, pruned_loss=0.009848, audio_tagging_loss=0.009851, over 14537.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08957, pruned_loss=0.0119, audio_tagging_loss=0.008603, over 3045194.04 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:48:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3713073.3333333335, ans=0.1 2023-11-27 03:48:30,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3713140.0, ans=0.125 2023-11-27 03:48:31,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3713140.0, ans=15.0 2023-11-27 03:48:43,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3713206.6666666665, ans=0.125 2023-11-27 03:48:48,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3713206.6666666665, ans=0.125 2023-11-27 03:49:01,537 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-27 03:49:02,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3713273.3333333335, ans=0.07 2023-11-27 03:49:04,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3713340.0, ans=0.035 2023-11-27 03:49:04,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3713340.0, ans=0.0 2023-11-27 03:49:05,018 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3900, loss[loss=0.06883, simple_loss=0.09729, pruned_loss=0.01335, audio_tagging_loss=0.006837, over 14128.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08921, pruned_loss=0.01173, audio_tagging_loss=0.00864, over 3042088.78 frames. ], batch size: 52, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:49:16,693 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 9.076e+01 9.575e+01 1.020e+02 1.197e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:49:19,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3713406.6666666665, ans=0.0 2023-11-27 03:49:39,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=22.5 2023-11-27 03:49:39,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3713540.0, ans=0.04949747468305833 2023-11-27 03:49:57,106 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-27 03:50:00,209 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3950, loss[loss=0.06716, simple_loss=0.0902, pruned_loss=0.01027, audio_tagging_loss=0.01179, over 14537.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08875, pruned_loss=0.01174, audio_tagging_loss=0.008816, over 3043143.71 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:50:19,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3713740.0, ans=0.0 2023-11-27 03:50:30,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3713806.6666666665, ans=0.125 2023-11-27 03:50:52,437 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-27 03:50:56,116 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4000, loss[loss=0.05917, simple_loss=0.07949, pruned_loss=0.007826, audio_tagging_loss=0.0116, over 14886.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.0896, pruned_loss=0.01196, audio_tagging_loss=0.008889, over 3040832.62 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:50:58,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3714006.6666666665, ans=0.0 2023-11-27 03:51:04,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2023-11-27 03:51:08,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.327e+01 9.767e+01 1.033e+02 1.414e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 03:51:30,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3714206.6666666665, ans=0.125 2023-11-27 03:51:33,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3714206.6666666665, ans=0.125 2023-11-27 03:51:48,486 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-27 03:51:52,056 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4050, loss[loss=0.06963, simple_loss=0.08873, pruned_loss=0.01645, audio_tagging_loss=0.008812, over 14649.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08961, pruned_loss=0.01198, audio_tagging_loss=0.008847, over 3042467.21 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:51:54,221 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:51:58,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3714340.0, ans=0.0 2023-11-27 03:52:01,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-11-27 03:52:08,291 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:52:13,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3714473.3333333335, ans=0.95 2023-11-27 03:52:27,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-27 03:52:30,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=22.5 2023-11-27 03:52:44,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-27 03:52:47,549 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4100, loss[loss=0.07151, simple_loss=0.09823, pruned_loss=0.01297, audio_tagging_loss=0.009433, over 15056.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08956, pruned_loss=0.01194, audio_tagging_loss=0.008783, over 3046599.91 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:52:54,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3714673.3333333335, ans=0.125 2023-11-27 03:52:58,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3714740.0, ans=0.2 2023-11-27 03:52:59,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 9.062e+01 9.739e+01 1.030e+02 1.331e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:53:00,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3714740.0, ans=0.1 2023-11-27 03:53:16,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3714806.6666666665, ans=0.05 2023-11-27 03:53:32,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-27 03:53:33,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3714940.0, ans=0.1 2023-11-27 03:53:34,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3714940.0, ans=0.1 2023-11-27 03:53:40,363 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-27 03:53:42,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3715006.6666666665, ans=0.2 2023-11-27 03:53:43,500 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4150, loss[loss=0.05822, simple_loss=0.08373, pruned_loss=0.007981, audio_tagging_loss=0.008377, over 15004.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.0902, pruned_loss=0.01202, audio_tagging_loss=0.008671, over 3044359.60 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:54:00,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3715073.3333333335, ans=0.125 2023-11-27 03:54:10,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-27 03:54:14,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3715140.0, ans=0.1 2023-11-27 03:54:18,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-27 03:54:22,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-27 03:54:24,031 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:54:27,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3715273.3333333335, ans=0.125 2023-11-27 03:54:30,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.80 vs. limit=10.0 2023-11-27 03:54:34,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3715273.3333333335, ans=0.125 2023-11-27 03:54:35,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3715273.3333333335, ans=0.0 2023-11-27 03:54:36,724 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-27 03:54:39,857 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4200, loss[loss=0.091, simple_loss=0.1287, pruned_loss=0.01907, audio_tagging_loss=0.007599, over 16010.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08951, pruned_loss=0.01191, audio_tagging_loss=0.008613, over 3039334.20 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:54:51,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.957e+01 9.619e+01 1.045e+02 2.364e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-27 03:54:57,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2023-11-27 03:55:04,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3715473.3333333335, ans=0.125 2023-11-27 03:55:17,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3715540.0, ans=0.125 2023-11-27 03:55:17,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3715540.0, ans=0.0 2023-11-27 03:55:24,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3715606.6666666665, ans=0.0 2023-11-27 03:55:32,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-11-27 03:55:32,784 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-27 03:55:35,911 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4250, loss[loss=0.04823, simple_loss=0.06407, pruned_loss=0.006466, audio_tagging_loss=0.009728, over 14484.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08912, pruned_loss=0.01178, audio_tagging_loss=0.008523, over 3042739.90 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:55:39,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3715673.3333333335, ans=0.1 2023-11-27 03:55:52,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3715740.0, ans=0.2 2023-11-27 03:55:59,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3715806.6666666665, ans=10.0 2023-11-27 03:56:04,562 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:56:07,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3715806.6666666665, ans=0.0 2023-11-27 03:56:08,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3715873.3333333335, ans=0.125 2023-11-27 03:56:16,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3715873.3333333335, ans=0.125 2023-11-27 03:56:23,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3715940.0, ans=0.0 2023-11-27 03:56:28,279 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-27 03:56:31,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2023-11-27 03:56:31,983 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4300, loss[loss=0.08375, simple_loss=0.1263, pruned_loss=0.01499, audio_tagging_loss=0.005624, over 15514.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09093, pruned_loss=0.01205, audio_tagging_loss=0.008321, over 3041609.56 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:56:33,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3716006.6666666665, ans=0.125 2023-11-27 03:56:34,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=8.0 2023-11-27 03:56:43,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716073.3333333335, ans=0.1 2023-11-27 03:56:44,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.063e+01 9.742e+01 1.048e+02 1.434e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:56:46,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3716073.3333333335, ans=0.0 2023-11-27 03:57:01,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3716140.0, ans=0.125 2023-11-27 03:57:04,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3716206.6666666665, ans=0.125 2023-11-27 03:57:08,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3716206.6666666665, ans=0.125 2023-11-27 03:57:16,586 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:57:24,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-27 03:57:27,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716340.0, ans=0.1 2023-11-27 03:57:28,059 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4350, loss[loss=0.06363, simple_loss=0.08711, pruned_loss=0.01077, audio_tagging_loss=0.009307, over 14203.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09153, pruned_loss=0.01223, audio_tagging_loss=0.008256, over 3048472.18 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:57:28,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3716340.0, ans=0.125 2023-11-27 03:57:30,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3716340.0, ans=0.0 2023-11-27 03:57:31,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3716340.0, ans=0.0 2023-11-27 03:57:39,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3716406.6666666665, ans=15.0 2023-11-27 03:57:47,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-27 03:58:10,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-27 03:58:20,051 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-27 03:58:23,186 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4400, loss[loss=0.08264, simple_loss=0.1277, pruned_loss=0.01108, audio_tagging_loss=0.007718, over 14971.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09071, pruned_loss=0.01215, audio_tagging_loss=0.008335, over 3044730.07 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:58:28,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-27 03:58:31,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3716673.3333333335, ans=0.1 2023-11-27 03:58:35,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 9.094e+01 9.740e+01 1.025e+02 1.251e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:58:42,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2023-11-27 03:58:42,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-27 03:58:46,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3716806.6666666665, ans=0.125 2023-11-27 03:58:51,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3716806.6666666665, ans=0.125 2023-11-27 03:58:52,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3716806.6666666665, ans=0.125 2023-11-27 03:59:05,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3716873.3333333335, ans=0.2 2023-11-27 03:59:15,452 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-27 03:59:18,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-11-27 03:59:18,576 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4450, loss[loss=0.07566, simple_loss=0.1111, pruned_loss=0.01306, audio_tagging_loss=0.007061, over 15447.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09068, pruned_loss=0.01212, audio_tagging_loss=0.008355, over 3041110.32 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:00:11,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-27 04:00:15,457 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4500, loss[loss=0.07086, simple_loss=0.09251, pruned_loss=0.01629, audio_tagging_loss=0.008316, over 15132.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09021, pruned_loss=0.01208, audio_tagging_loss=0.00838, over 3044694.29 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:00:27,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.128e+01 9.724e+01 1.026e+02 1.221e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 04:00:37,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3717473.3333333335, ans=0.125 2023-11-27 04:00:39,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3717473.3333333335, ans=0.04949747468305833 2023-11-27 04:00:42,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3717473.3333333335, ans=0.0 2023-11-27 04:00:56,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3717540.0, ans=0.2 2023-11-27 04:01:02,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3717606.6666666665, ans=0.125 2023-11-27 04:01:03,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:01:04,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3717606.6666666665, ans=0.125 2023-11-27 04:01:07,881 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-27 04:01:11,019 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4550, loss[loss=0.0662, simple_loss=0.0844, pruned_loss=0.01449, audio_tagging_loss=0.009505, over 15600.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08863, pruned_loss=0.01186, audio_tagging_loss=0.008494, over 3045876.75 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:01:13,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-27 04:01:14,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3717673.3333333335, ans=0.0 2023-11-27 04:01:31,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.83 vs. limit=15.0 2023-11-27 04:01:49,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3717873.3333333335, ans=0.125 2023-11-27 04:01:53,961 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:01:56,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3717940.0, ans=0.125 2023-11-27 04:02:03,749 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-27 04:02:03,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3717940.0, ans=0.0 2023-11-27 04:02:07,398 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4600, loss[loss=0.05507, simple_loss=0.07492, pruned_loss=0.007322, audio_tagging_loss=0.01029, over 14917.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08852, pruned_loss=0.01176, audio_tagging_loss=0.008532, over 3040334.13 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:02:12,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3718006.6666666665, ans=0.2 2023-11-27 04:02:20,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.735e+01 9.058e+01 9.556e+01 1.027e+02 1.489e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 04:02:30,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3718140.0, ans=0.125 2023-11-27 04:02:47,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3718206.6666666665, ans=0.125 2023-11-27 04:02:49,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3718206.6666666665, ans=0.1 2023-11-27 04:03:00,549 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-27 04:03:04,170 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4650, loss[loss=0.07675, simple_loss=0.1077, pruned_loss=0.0134, audio_tagging_loss=0.009483, over 15207.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08907, pruned_loss=0.0118, audio_tagging_loss=0.008561, over 3039765.90 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:03:19,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3718406.6666666665, ans=0.125 2023-11-27 04:03:26,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=22.5 2023-11-27 04:03:46,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3718540.0, ans=0.025 2023-11-27 04:03:49,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3718606.6666666665, ans=0.0 2023-11-27 04:03:56,138 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-27 04:03:59,559 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4700, loss[loss=0.05628, simple_loss=0.0708, pruned_loss=0.01056, audio_tagging_loss=0.01032, over 14561.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09014, pruned_loss=0.01196, audio_tagging_loss=0.008699, over 3049080.87 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:04:00,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.25 vs. limit=22.5 2023-11-27 04:04:00,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3718673.3333333335, ans=0.125 2023-11-27 04:04:07,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3718673.3333333335, ans=0.2 2023-11-27 04:04:10,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-27 04:04:11,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2023-11-27 04:04:12,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 9.072e+01 9.943e+01 1.043e+02 1.382e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 04:04:18,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3718740.0, ans=0.125 2023-11-27 04:04:33,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3718873.3333333335, ans=0.125 2023-11-27 04:04:37,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3718873.3333333335, ans=0.125 2023-11-27 04:04:51,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-27 04:04:54,123 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4750, loss[loss=0.04736, simple_loss=0.06017, pruned_loss=0.005558, audio_tagging_loss=0.01172, over 14855.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08978, pruned_loss=0.01191, audio_tagging_loss=0.008796, over 3048822.62 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:05:00,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3719006.6666666665, ans=6.0 2023-11-27 04:05:05,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3719073.3333333335, ans=0.1 2023-11-27 04:05:06,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-11-27 04:05:07,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3719073.3333333335, ans=0.015 2023-11-27 04:05:11,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3719073.3333333335, ans=0.0 2023-11-27 04:05:29,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3719206.6666666665, ans=0.125 2023-11-27 04:05:47,629 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-27 04:05:50,734 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4800, loss[loss=0.05255, simple_loss=0.06107, pruned_loss=0.01042, audio_tagging_loss=0.0116, over 15846.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09082, pruned_loss=0.01206, audio_tagging_loss=0.008857, over 3053250.27 frames. ], batch size: 64, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:05:57,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3719340.0, ans=0.0 2023-11-27 04:06:03,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-27 04:06:05,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.052e+01 9.526e+01 1.032e+02 1.738e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 04:06:14,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3719473.3333333335, ans=0.09899494936611666 2023-11-27 04:06:24,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3719540.0, ans=0.125 2023-11-27 04:06:41,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3719606.6666666665, ans=0.125 2023-11-27 04:06:43,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-27 04:06:44,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2023-11-27 04:06:46,400 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4850, loss[loss=0.05919, simple_loss=0.07973, pruned_loss=0.009897, audio_tagging_loss=0.009432, over 15917.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09008, pruned_loss=0.01201, audio_tagging_loss=0.008985, over 3057012.54 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:06:53,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3719673.3333333335, ans=0.2 2023-11-27 04:06:59,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3719740.0, ans=0.125 2023-11-27 04:07:28,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3719873.3333333335, ans=0.2 2023-11-27 04:07:38,009 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-27 04:07:41,444 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4900, loss[loss=0.07093, simple_loss=0.1037, pruned_loss=0.01198, audio_tagging_loss=0.007102, over 14417.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08986, pruned_loss=0.012, audio_tagging_loss=0.008949, over 3050122.19 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:07:54,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3720073.3333333335, ans=0.1 2023-11-27 04:07:56,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.861e+01 9.533e+01 1.009e+02 1.253e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 04:08:04,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-11-27 04:08:33,869 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-27 04:08:37,508 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4950, loss[loss=0.06344, simple_loss=0.08803, pruned_loss=0.01194, audio_tagging_loss=0.007487, over 14910.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08947, pruned_loss=0.01205, audio_tagging_loss=0.008799, over 3041843.58 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:08:43,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3720340.0, ans=0.125 2023-11-27 04:08:43,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3720340.0, ans=0.0 2023-11-27 04:08:50,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.18 vs. limit=6.0 2023-11-27 04:08:54,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3720406.6666666665, ans=0.0 2023-11-27 04:09:10,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3720540.0, ans=0.125 2023-11-27 04:09:20,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3720540.0, ans=0.2 2023-11-27 04:09:27,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3720606.6666666665, ans=0.0 2023-11-27 04:09:31,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-27 04:09:34,159 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5000, loss[loss=0.0567, simple_loss=0.0807, pruned_loss=0.00765, audio_tagging_loss=0.008701, over 14558.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08854, pruned_loss=0.0118, audio_tagging_loss=0.008741, over 3041190.81 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:09:47,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.780e+01 9.541e+01 1.005e+02 1.452e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 04:09:58,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3720806.6666666665, ans=0.125 2023-11-27 04:10:19,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3720940.0, ans=0.0 2023-11-27 04:10:23,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3720940.0, ans=0.125 2023-11-27 04:10:25,992 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-27 04:10:29,095 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5050, loss[loss=0.04571, simple_loss=0.05375, pruned_loss=0.007527, audio_tagging_loss=0.01131, over 15610.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08877, pruned_loss=0.01201, audio_tagging_loss=0.008634, over 3041236.67 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:10:49,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2023-11-27 04:11:06,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3721206.6666666665, ans=0.125 2023-11-27 04:11:08,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.88 vs. limit=10.0 2023-11-27 04:11:08,760 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:11:20,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=12.0 2023-11-27 04:11:21,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-27 04:11:25,287 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5100, loss[loss=0.05, simple_loss=0.05908, pruned_loss=0.005827, audio_tagging_loss=0.01464, over 14596.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08812, pruned_loss=0.01183, audio_tagging_loss=0.00874, over 3033683.47 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:11:27,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3721340.0, ans=0.125 2023-11-27 04:11:36,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-11-27 04:11:37,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3721406.6666666665, ans=0.125 2023-11-27 04:11:40,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.877e+01 9.521e+01 1.045e+02 1.362e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:12:02,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3721540.0, ans=10.0 2023-11-27 04:12:19,036 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-27 04:12:22,229 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5150, loss[loss=0.05597, simple_loss=0.07409, pruned_loss=0.007826, audio_tagging_loss=0.0111, over 14720.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08723, pruned_loss=0.01161, audio_tagging_loss=0.008797, over 3038868.97 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:12:32,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3721740.0, ans=0.0 2023-11-27 04:12:32,931 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:13:14,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-27 04:13:17,298 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5200, loss[loss=0.06268, simple_loss=0.08367, pruned_loss=0.009492, audio_tagging_loss=0.01135, over 14831.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08875, pruned_loss=0.01195, audio_tagging_loss=0.008699, over 3037472.10 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:13:22,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.88 vs. limit=10.0 2023-11-27 04:13:29,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3722073.3333333335, ans=0.125 2023-11-27 04:13:31,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.871e+01 9.548e+01 1.019e+02 1.156e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 04:13:42,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-27 04:13:49,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3722140.0, ans=0.0 2023-11-27 04:13:54,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-11-27 04:14:05,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3722273.3333333335, ans=0.125 2023-11-27 04:14:09,366 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-27 04:14:13,079 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5250, loss[loss=0.05303, simple_loss=0.07255, pruned_loss=0.008072, audio_tagging_loss=0.00868, over 15346.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08842, pruned_loss=0.01192, audio_tagging_loss=0.008581, over 3033057.10 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:14:22,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3722340.0, ans=0.2 2023-11-27 04:14:26,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2023-11-27 04:14:27,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3722406.6666666665, ans=0.125 2023-11-27 04:14:29,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-27 04:14:46,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3722540.0, ans=0.125 2023-11-27 04:14:47,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.94 vs. limit=10.0 2023-11-27 04:14:47,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3722540.0, ans=0.125 2023-11-27 04:14:55,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3722540.0, ans=0.125 2023-11-27 04:14:57,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=22.5 2023-11-27 04:15:03,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3722606.6666666665, ans=0.125 2023-11-27 04:15:05,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-27 04:15:09,433 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5300, loss[loss=0.06856, simple_loss=0.09709, pruned_loss=0.01235, audio_tagging_loss=0.007664, over 15135.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08936, pruned_loss=0.01203, audio_tagging_loss=0.008498, over 3035491.30 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:15:15,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3722673.3333333335, ans=0.125 2023-11-27 04:15:24,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.122e+01 9.674e+01 1.051e+02 1.467e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:15:33,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3722806.6666666665, ans=0.0 2023-11-27 04:15:53,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3722940.0, ans=0.0 2023-11-27 04:16:02,263 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-27 04:16:05,424 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5350, loss[loss=0.07375, simple_loss=0.1154, pruned_loss=0.008746, audio_tagging_loss=0.007311, over 15298.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08937, pruned_loss=0.01195, audio_tagging_loss=0.008472, over 3032378.80 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:16:22,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3723073.3333333335, ans=0.05 2023-11-27 04:16:57,339 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-27 04:17:00,459 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5400, loss[loss=0.08508, simple_loss=0.1257, pruned_loss=0.01498, audio_tagging_loss=0.007273, over 15716.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08924, pruned_loss=0.0119, audio_tagging_loss=0.008504, over 3044368.54 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:17:00,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3723340.0, ans=0.0 2023-11-27 04:17:09,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3723340.0, ans=0.0 2023-11-27 04:17:10,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3723340.0, ans=0.125 2023-11-27 04:17:10,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3723340.0, ans=0.0 2023-11-27 04:17:16,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.000e+01 9.552e+01 1.029e+02 2.043e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 04:17:42,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3723540.0, ans=0.125 2023-11-27 04:17:49,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2023-11-27 04:17:53,672 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-27 04:17:54,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3723606.6666666665, ans=0.0 2023-11-27 04:17:57,347 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5450, loss[loss=0.06332, simple_loss=0.0871, pruned_loss=0.01029, audio_tagging_loss=0.009474, over 14368.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08959, pruned_loss=0.01209, audio_tagging_loss=0.008606, over 3048354.40 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:18:00,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3723673.3333333335, ans=0.125 2023-11-27 04:18:05,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3723673.3333333335, ans=0.0 2023-11-27 04:18:16,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3723740.0, ans=0.2 2023-11-27 04:18:21,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3723806.6666666665, ans=0.0 2023-11-27 04:18:28,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=15.0 2023-11-27 04:18:40,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3723940.0, ans=0.07 2023-11-27 04:18:40,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3723940.0, ans=0.125 2023-11-27 04:18:49,209 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-27 04:18:49,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3723940.0, ans=0.125 2023-11-27 04:18:52,634 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5500, loss[loss=0.0527, simple_loss=0.06947, pruned_loss=0.009239, audio_tagging_loss=0.008722, over 14989.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08922, pruned_loss=0.01198, audio_tagging_loss=0.008637, over 3043475.67 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:19:07,820 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.127e+01 9.786e+01 1.057e+02 1.357e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 04:19:40,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2023-11-27 04:19:45,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-27 04:19:48,356 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5550, loss[loss=0.05942, simple_loss=0.08881, pruned_loss=0.0091, audio_tagging_loss=0.005917, over 15188.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08834, pruned_loss=0.01175, audio_tagging_loss=0.008684, over 3047285.89 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:19:48,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3724340.0, ans=0.0 2023-11-27 04:20:03,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3724406.6666666665, ans=0.0 2023-11-27 04:20:14,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-27 04:20:18,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3724473.3333333335, ans=0.125 2023-11-27 04:20:41,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-27 04:20:44,609 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5600, loss[loss=0.0707, simple_loss=0.09166, pruned_loss=0.0155, audio_tagging_loss=0.009371, over 14583.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08921, pruned_loss=0.01187, audio_tagging_loss=0.008818, over 3050245.49 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:20:59,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3724740.0, ans=0.0 2023-11-27 04:21:00,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 9.029e+01 9.775e+01 1.051e+02 1.247e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 04:21:07,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724806.6666666665, ans=0.1 2023-11-27 04:21:07,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-27 04:21:12,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3724806.6666666665, ans=0.0 2023-11-27 04:21:23,809 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:21:24,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-27 04:21:29,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3724940.0, ans=0.2 2023-11-27 04:21:31,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3724940.0, ans=0.07 2023-11-27 04:21:37,136 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-27 04:21:40,278 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5650, loss[loss=0.06246, simple_loss=0.07755, pruned_loss=0.01407, audio_tagging_loss=0.009618, over 14824.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08993, pruned_loss=0.01192, audio_tagging_loss=0.008801, over 3058350.43 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:21:41,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2023-11-27 04:22:08,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-27 04:22:27,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3725273.3333333335, ans=0.125 2023-11-27 04:22:33,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-27 04:22:36,439 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5700, loss[loss=0.07147, simple_loss=0.0959, pruned_loss=0.01639, audio_tagging_loss=0.007132, over 14654.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0894, pruned_loss=0.01193, audio_tagging_loss=0.008864, over 3050780.65 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:22:36,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-27 04:22:45,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=22.5 2023-11-27 04:22:53,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3725406.6666666665, ans=0.0 2023-11-27 04:22:53,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 9.100e+01 9.627e+01 1.012e+02 1.597e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 04:23:16,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3725540.0, ans=0.05 2023-11-27 04:23:27,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3725606.6666666665, ans=0.125 2023-11-27 04:23:28,769 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-27 04:23:32,480 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5750, loss[loss=0.06831, simple_loss=0.09936, pruned_loss=0.01107, audio_tagging_loss=0.007556, over 15243.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08901, pruned_loss=0.01185, audio_tagging_loss=0.008732, over 3045652.97 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:23:44,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3725740.0, ans=0.125 2023-11-27 04:23:47,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.17 vs. limit=22.5 2023-11-27 04:23:48,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3725740.0, ans=0.0 2023-11-27 04:23:54,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3725806.6666666665, ans=0.015 2023-11-27 04:23:59,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-27 04:24:04,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-27 04:24:22,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-11-27 04:24:23,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3725940.0, ans=0.2 2023-11-27 04:24:25,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-27 04:24:25,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.76 vs. limit=10.0 2023-11-27 04:24:25,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=12.0 2023-11-27 04:24:28,361 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5800, loss[loss=0.09169, simple_loss=0.1268, pruned_loss=0.02016, audio_tagging_loss=0.008136, over 16338.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08831, pruned_loss=0.01175, audio_tagging_loss=0.008719, over 3044672.05 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:24:42,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-11-27 04:24:44,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.898e+01 9.503e+01 1.014e+02 1.698e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 04:24:54,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3726140.0, ans=0.125 2023-11-27 04:24:58,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=12.0 2023-11-27 04:25:06,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2023-11-27 04:25:20,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-27 04:25:23,510 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5850, loss[loss=0.06346, simple_loss=0.09489, pruned_loss=0.009631, audio_tagging_loss=0.006383, over 14370.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08781, pruned_loss=0.01165, audio_tagging_loss=0.008648, over 3038789.05 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:25:24,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-27 04:25:24,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3726340.0, ans=0.0 2023-11-27 04:25:45,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2023-11-27 04:25:48,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3726473.3333333335, ans=0.125 2023-11-27 04:25:57,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-27 04:26:04,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3726540.0, ans=0.09899494936611666 2023-11-27 04:26:06,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3726540.0, ans=0.125 2023-11-27 04:26:16,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-27 04:26:20,663 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5900, loss[loss=0.05702, simple_loss=0.07392, pruned_loss=0.01192, audio_tagging_loss=0.008144, over 15785.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08804, pruned_loss=0.01183, audio_tagging_loss=0.008544, over 3043993.31 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:26:21,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3726673.3333333335, ans=0.125 2023-11-27 04:26:35,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3726740.0, ans=0.125 2023-11-27 04:26:37,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 9.105e+01 9.736e+01 1.055e+02 1.471e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 04:27:04,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3726940.0, ans=0.125 2023-11-27 04:27:05,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3726940.0, ans=0.2 2023-11-27 04:27:13,050 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-27 04:27:16,214 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5950, loss[loss=0.0608, simple_loss=0.07537, pruned_loss=0.01379, audio_tagging_loss=0.009329, over 13938.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08818, pruned_loss=0.01192, audio_tagging_loss=0.008551, over 3049251.82 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:27:30,196 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:27:34,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2023-11-27 04:27:41,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3727140.0, ans=0.1 2023-11-27 04:27:51,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3727206.6666666665, ans=0.09899494936611666 2023-11-27 04:28:07,738 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-27 04:28:07,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3727273.3333333335, ans=0.2 2023-11-27 04:28:10,860 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6000, loss[loss=0.05451, simple_loss=0.06498, pruned_loss=0.012, audio_tagging_loss=0.01002, over 14835.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08868, pruned_loss=0.01203, audio_tagging_loss=0.008514, over 3046723.70 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:28:10,861 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 04:28:40,323 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4735, 3.8145, 4.3474, 3.5494], device='cuda:1') 2023-11-27 04:28:43,422 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05733, simple_loss=0.05048, pruned_loss=0.005338, audio_tagging_loss=0.02675, over 4681554.00 frames. 2023-11-27 04:28:43,423 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 04:28:56,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3727406.6666666665, ans=0.0 2023-11-27 04:28:59,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.039e+01 9.599e+01 1.058e+02 1.819e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 04:29:21,494 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:29:30,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3727606.6666666665, ans=0.0 2023-11-27 04:29:35,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-27 04:29:38,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2023-11-27 04:29:39,054 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6050, loss[loss=0.05826, simple_loss=0.07656, pruned_loss=0.009447, audio_tagging_loss=0.01054, over 16054.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08876, pruned_loss=0.01195, audio_tagging_loss=0.008516, over 3052462.60 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:29:49,937 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:30:08,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.44 vs. limit=22.5 2023-11-27 04:30:25,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3727940.0, ans=0.0 2023-11-27 04:30:27,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3727940.0, ans=0.125 2023-11-27 04:30:31,209 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-27 04:30:34,600 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6100, loss[loss=0.07979, simple_loss=0.1115, pruned_loss=0.0163, audio_tagging_loss=0.007741, over 14597.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08794, pruned_loss=0.01174, audio_tagging_loss=0.00856, over 3051286.05 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:30:43,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3728006.6666666665, ans=0.125 2023-11-27 04:30:50,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-27 04:30:53,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.839e+01 9.378e+01 9.986e+01 1.390e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 04:30:54,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2023-11-27 04:31:10,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3728206.6666666665, ans=0.125 2023-11-27 04:31:12,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-27 04:31:13,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3728206.6666666665, ans=0.2 2023-11-27 04:31:19,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3728273.3333333335, ans=0.1 2023-11-27 04:31:26,136 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-27 04:31:30,231 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6150, loss[loss=0.0691, simple_loss=0.09139, pruned_loss=0.01601, audio_tagging_loss=0.007402, over 14820.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08775, pruned_loss=0.01161, audio_tagging_loss=0.008567, over 3044850.59 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:32:22,922 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-27 04:32:26,080 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6200, loss[loss=0.06678, simple_loss=0.09058, pruned_loss=0.01292, audio_tagging_loss=0.008571, over 15142.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08745, pruned_loss=0.01169, audio_tagging_loss=0.008605, over 3038221.93 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:32:33,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3728673.3333333335, ans=0.2 2023-11-27 04:32:42,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.961e+01 9.532e+01 1.029e+02 1.294e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:32:50,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3728806.6666666665, ans=0.07 2023-11-27 04:33:08,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3728873.3333333335, ans=0.125 2023-11-27 04:33:17,926 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-27 04:33:21,007 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6250, loss[loss=0.07699, simple_loss=0.1092, pruned_loss=0.01367, audio_tagging_loss=0.008709, over 15274.00 frames. ], tot_loss[loss=0.06391, simple_loss=0.08717, pruned_loss=0.01164, audio_tagging_loss=0.008682, over 3035689.52 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:33:29,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3729006.6666666665, ans=0.0 2023-11-27 04:34:06,875 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:34:09,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3729273.3333333335, ans=0.0 2023-11-27 04:34:09,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3729273.3333333335, ans=0.125 2023-11-27 04:34:13,106 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-27 04:34:15,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3729340.0, ans=0.125 2023-11-27 04:34:16,468 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6300, loss[loss=0.06729, simple_loss=0.1033, pruned_loss=0.009491, audio_tagging_loss=0.006133, over 15803.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08851, pruned_loss=0.01196, audio_tagging_loss=0.008656, over 3038480.70 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:34:19,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3729340.0, ans=0.125 2023-11-27 04:34:35,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.982e+01 9.657e+01 1.038e+02 1.298e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 04:35:01,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3729606.6666666665, ans=0.125 2023-11-27 04:35:06,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3729606.6666666665, ans=0.0 2023-11-27 04:35:10,055 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-27 04:35:11,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3729606.6666666665, ans=0.1 2023-11-27 04:35:13,211 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6350, loss[loss=0.05922, simple_loss=0.07888, pruned_loss=0.007361, audio_tagging_loss=0.01242, over 15132.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08727, pruned_loss=0.01179, audio_tagging_loss=0.008831, over 3038300.42 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:35:17,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3729673.3333333335, ans=0.0 2023-11-27 04:35:25,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3729740.0, ans=0.125 2023-11-27 04:35:55,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3729873.3333333335, ans=0.0 2023-11-27 04:35:56,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3729940.0, ans=15.0 2023-11-27 04:35:57,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-27 04:35:59,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3729940.0, ans=0.2 2023-11-27 04:36:01,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-27 04:36:05,612 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-27 04:36:05,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3729940.0, ans=0.2 2023-11-27 04:36:06,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3729940.0, ans=0.125 2023-11-27 04:36:07,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3730006.6666666665, ans=0.05 2023-11-27 04:36:08,711 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6400, loss[loss=0.08111, simple_loss=0.1087, pruned_loss=0.01764, audio_tagging_loss=0.009114, over 15112.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08824, pruned_loss=0.01197, audio_tagging_loss=0.008858, over 3043777.26 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:36:14,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3730006.6666666665, ans=0.0 2023-11-27 04:36:15,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3730006.6666666665, ans=0.09899494936611666 2023-11-27 04:36:15,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-27 04:36:20,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3730073.3333333335, ans=0.2 2023-11-27 04:36:20,546 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:36:26,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.070e+01 9.519e+01 1.025e+02 1.551e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:36:29,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3730073.3333333335, ans=0.0 2023-11-27 04:37:00,623 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-27 04:37:03,654 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6450, loss[loss=0.08631, simple_loss=0.1121, pruned_loss=0.01909, audio_tagging_loss=0.01119, over 15423.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08801, pruned_loss=0.01198, audio_tagging_loss=0.008971, over 3041088.90 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:37:21,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3730406.6666666665, ans=0.0 2023-11-27 04:37:43,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3730540.0, ans=0.95 2023-11-27 04:37:51,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3730606.6666666665, ans=0.1 2023-11-27 04:37:55,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3730606.6666666665, ans=0.125 2023-11-27 04:37:56,982 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-27 04:38:00,433 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6500, loss[loss=0.05729, simple_loss=0.07682, pruned_loss=0.009844, audio_tagging_loss=0.009031, over 15138.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.0888, pruned_loss=0.01201, audio_tagging_loss=0.008904, over 3044346.46 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:38:12,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-11-27 04:38:15,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-27 04:38:18,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.947e+01 9.565e+01 1.041e+02 1.320e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:38:20,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3730740.0, ans=0.0 2023-11-27 04:38:34,242 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:38:36,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3730873.3333333335, ans=0.5 2023-11-27 04:38:45,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-27 04:38:49,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-27 04:38:51,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3730940.0, ans=0.125 2023-11-27 04:38:51,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0 2023-11-27 04:38:53,115 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-27 04:38:56,214 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6550, loss[loss=0.09384, simple_loss=0.1309, pruned_loss=0.02237, audio_tagging_loss=0.006008, over 15632.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0896, pruned_loss=0.01216, audio_tagging_loss=0.008697, over 3048182.81 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:39:16,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3731073.3333333335, ans=0.09899494936611666 2023-11-27 04:39:36,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3731206.6666666665, ans=0.125 2023-11-27 04:39:47,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-27 04:39:51,016 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6600, loss[loss=0.05651, simple_loss=0.07701, pruned_loss=0.0107, audio_tagging_loss=0.007313, over 15825.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08836, pruned_loss=0.01195, audio_tagging_loss=0.008565, over 3047051.65 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:40:03,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3731406.6666666665, ans=0.2 2023-11-27 04:40:09,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 8.917e+01 9.565e+01 1.016e+02 1.189e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:40:23,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3731473.3333333335, ans=0.1 2023-11-27 04:40:24,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3731540.0, ans=0.0 2023-11-27 04:40:31,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2023-11-27 04:40:44,496 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-27 04:40:47,624 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6650, loss[loss=0.04531, simple_loss=0.0604, pruned_loss=0.005994, audio_tagging_loss=0.009117, over 15545.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.0872, pruned_loss=0.01177, audio_tagging_loss=0.008591, over 3044088.88 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:40:55,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3731673.3333333335, ans=0.125 2023-11-27 04:40:55,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3731673.3333333335, ans=0.125 2023-11-27 04:41:22,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3731873.3333333335, ans=0.04949747468305833 2023-11-27 04:41:31,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3731940.0, ans=0.125 2023-11-27 04:41:39,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-27 04:41:42,946 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6700, loss[loss=0.07916, simple_loss=0.1123, pruned_loss=0.01633, audio_tagging_loss=0.006668, over 16055.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08841, pruned_loss=0.012, audio_tagging_loss=0.008437, over 3045577.65 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:41:43,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3732006.6666666665, ans=0.125 2023-11-27 04:42:03,041 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.829e+01 9.518e+01 1.003e+02 1.219e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:42:11,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-11-27 04:42:22,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3732206.6666666665, ans=0.09899494936611666 2023-11-27 04:42:35,427 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-27 04:42:35,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.62 vs. limit=10.0 2023-11-27 04:42:38,561 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6750, loss[loss=0.06059, simple_loss=0.08503, pruned_loss=0.01111, audio_tagging_loss=0.006964, over 15023.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08824, pruned_loss=0.01191, audio_tagging_loss=0.008472, over 3034943.37 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:42:39,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3732340.0, ans=0.125 2023-11-27 04:42:57,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3732406.6666666665, ans=0.0 2023-11-27 04:43:05,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3732473.3333333335, ans=0.125 2023-11-27 04:43:11,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732540.0, ans=0.125 2023-11-27 04:43:27,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-27 04:43:28,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-27 04:43:31,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-27 04:43:31,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-27 04:43:35,036 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6800, loss[loss=0.05904, simple_loss=0.0789, pruned_loss=0.01155, audio_tagging_loss=0.008043, over 14314.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08795, pruned_loss=0.01184, audio_tagging_loss=0.008503, over 3037695.57 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:43:35,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3732673.3333333335, ans=0.125 2023-11-27 04:43:43,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732673.3333333335, ans=0.125 2023-11-27 04:43:54,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.067e+01 9.528e+01 1.036e+02 1.458e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:43:58,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3732806.6666666665, ans=0.2 2023-11-27 04:44:16,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3732873.3333333335, ans=0.125 2023-11-27 04:44:19,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3732940.0, ans=0.125 2023-11-27 04:44:21,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3732940.0, ans=0.1 2023-11-27 04:44:26,867 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-27 04:44:29,968 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6850, loss[loss=0.06335, simple_loss=0.08541, pruned_loss=0.01152, audio_tagging_loss=0.009127, over 15304.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08745, pruned_loss=0.01179, audio_tagging_loss=0.008525, over 3031158.30 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:44:32,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3733006.6666666665, ans=0.0 2023-11-27 04:44:40,733 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:44:48,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3733073.3333333335, ans=0.2 2023-11-27 04:44:50,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3733140.0, ans=0.125 2023-11-27 04:44:52,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3733140.0, ans=0.0 2023-11-27 04:44:58,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3733140.0, ans=0.05 2023-11-27 04:45:07,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-27 04:45:21,867 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-27 04:45:27,243 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6900, loss[loss=0.06852, simple_loss=0.09662, pruned_loss=0.01136, audio_tagging_loss=0.008844, over 15588.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08882, pruned_loss=0.01193, audio_tagging_loss=0.00849, over 3043346.27 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:45:45,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2023-11-27 04:45:48,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.976e+01 9.495e+01 1.031e+02 1.745e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 04:45:48,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3733406.6666666665, ans=0.125 2023-11-27 04:46:02,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2023-11-27 04:46:04,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3733540.0, ans=0.125 2023-11-27 04:46:06,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3733540.0, ans=0.07 2023-11-27 04:46:09,628 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:46:09,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3733540.0, ans=0.125 2023-11-27 04:46:15,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-27 04:46:20,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-27 04:46:23,933 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6950, loss[loss=0.06223, simple_loss=0.08176, pruned_loss=0.01121, audio_tagging_loss=0.01014, over 15536.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08829, pruned_loss=0.01192, audio_tagging_loss=0.008602, over 3040982.86 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:46:28,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3733673.3333333335, ans=0.2 2023-11-27 04:47:15,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3733940.0, ans=0.125 2023-11-27 04:47:16,343 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-27 04:47:19,469 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7000, loss[loss=0.08947, simple_loss=0.1223, pruned_loss=0.01829, audio_tagging_loss=0.01001, over 14889.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08808, pruned_loss=0.01195, audio_tagging_loss=0.008688, over 3037568.15 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:47:39,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.428e+01 1.006e+02 1.288e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 04:47:44,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3734140.0, ans=0.125 2023-11-27 04:47:50,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3734140.0, ans=0.2 2023-11-27 04:47:57,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-27 04:48:03,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-27 04:48:07,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3734273.3333333335, ans=0.0 2023-11-27 04:48:11,189 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-27 04:48:14,253 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7050, loss[loss=0.07395, simple_loss=0.1061, pruned_loss=0.01303, audio_tagging_loss=0.007848, over 14995.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08786, pruned_loss=0.01174, audio_tagging_loss=0.008761, over 3042514.25 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 4.0 2023-11-27 04:48:32,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3734406.6666666665, ans=0.1 2023-11-27 04:49:06,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-27 04:49:10,414 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7100, loss[loss=0.05262, simple_loss=0.06441, pruned_loss=0.01076, audio_tagging_loss=0.009649, over 14869.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08739, pruned_loss=0.01164, audio_tagging_loss=0.008793, over 3042696.25 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:49:32,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.003e+01 9.804e+01 1.066e+02 3.214e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 04:49:45,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3734873.3333333335, ans=0.07 2023-11-27 04:49:46,033 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:49:51,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3734873.3333333335, ans=0.125 2023-11-27 04:49:56,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3734940.0, ans=0.125 2023-11-27 04:50:02,377 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-27 04:50:02,499 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:50:05,484 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7150, loss[loss=0.05237, simple_loss=0.07268, pruned_loss=0.005862, audio_tagging_loss=0.01017, over 15950.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08897, pruned_loss=0.01197, audio_tagging_loss=0.008729, over 3048907.55 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:50:12,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3735006.6666666665, ans=0.1 2023-11-27 04:50:33,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3735140.0, ans=0.125 2023-11-27 04:50:57,369 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-27 04:51:00,453 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7200, loss[loss=0.05915, simple_loss=0.07929, pruned_loss=0.01059, audio_tagging_loss=0.008916, over 16289.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08897, pruned_loss=0.01188, audio_tagging_loss=0.0088, over 3049431.47 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:51:12,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3735406.6666666665, ans=0.125 2023-11-27 04:51:23,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.928e+01 9.518e+01 1.020e+02 1.389e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:51:30,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-27 04:51:39,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3735540.0, ans=0.0 2023-11-27 04:51:42,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3735540.0, ans=0.1 2023-11-27 04:51:43,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3735606.6666666665, ans=0.0 2023-11-27 04:51:52,257 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-27 04:51:55,943 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7250, loss[loss=0.06908, simple_loss=0.08824, pruned_loss=0.01592, audio_tagging_loss=0.009042, over 14496.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08947, pruned_loss=0.0121, audio_tagging_loss=0.008803, over 3043323.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:52:24,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3735806.6666666665, ans=0.125 2023-11-27 04:52:24,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3735806.6666666665, ans=0.1 2023-11-27 04:52:24,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2023-11-27 04:52:37,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-27 04:52:39,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3735940.0, ans=0.125 2023-11-27 04:52:41,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3735940.0, ans=0.125 2023-11-27 04:52:44,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=12.0 2023-11-27 04:52:45,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-27 04:52:48,925 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-27 04:52:52,310 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7300, loss[loss=0.05883, simple_loss=0.08485, pruned_loss=0.009569, audio_tagging_loss=0.006839, over 15468.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08894, pruned_loss=0.0119, audio_tagging_loss=0.00873, over 3049787.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:53:08,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3736073.3333333335, ans=0.125 2023-11-27 04:53:13,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.819e+01 9.611e+01 1.034e+02 1.337e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 04:53:17,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3736140.0, ans=0.125 2023-11-27 04:53:20,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3736140.0, ans=0.125 2023-11-27 04:53:23,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3736140.0, ans=0.2 2023-11-27 04:53:34,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3736206.6666666665, ans=0.125 2023-11-27 04:53:44,147 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-27 04:53:46,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3736340.0, ans=0.2 2023-11-27 04:53:47,243 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7350, loss[loss=0.04906, simple_loss=0.06938, pruned_loss=0.009414, audio_tagging_loss=0.004961, over 14912.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08904, pruned_loss=0.01206, audio_tagging_loss=0.008607, over 3043947.36 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:54:26,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3736540.0, ans=0.2 2023-11-27 04:54:33,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3736606.6666666665, ans=0.125 2023-11-27 04:54:38,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-27 04:54:42,037 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7400, loss[loss=0.06176, simple_loss=0.08926, pruned_loss=0.0091, audio_tagging_loss=0.008035, over 14919.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08921, pruned_loss=0.01198, audio_tagging_loss=0.008502, over 3044625.09 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:54:43,374 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:54:52,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3736673.3333333335, ans=0.1 2023-11-27 04:55:04,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3736806.6666666665, ans=0.0 2023-11-27 04:55:05,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.988e+01 9.437e+01 1.029e+02 2.461e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-27 04:55:16,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2023-11-27 04:55:35,710 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-27 04:55:38,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3737006.6666666665, ans=0.2 2023-11-27 04:55:39,278 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7450, loss[loss=0.07676, simple_loss=0.1067, pruned_loss=0.01651, audio_tagging_loss=0.006884, over 16537.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08931, pruned_loss=0.01208, audio_tagging_loss=0.008451, over 3044868.08 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:55:45,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3737006.6666666665, ans=0.125 2023-11-27 04:55:53,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3737073.3333333335, ans=0.0 2023-11-27 04:55:54,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-27 04:56:09,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3737140.0, ans=0.2 2023-11-27 04:56:16,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3737206.6666666665, ans=0.125 2023-11-27 04:56:22,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3737273.3333333335, ans=0.125 2023-11-27 04:56:31,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-27 04:56:34,667 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7500, loss[loss=0.07082, simple_loss=0.09175, pruned_loss=0.0145, audio_tagging_loss=0.01044, over 14529.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08918, pruned_loss=0.01195, audio_tagging_loss=0.008464, over 3047191.15 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:56:48,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2023-11-27 04:56:50,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3737406.6666666665, ans=0.125 2023-11-27 04:56:56,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.930e+01 9.677e+01 1.022e+02 1.348e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:57:26,551 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-27 04:57:29,692 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7550, loss[loss=0.05047, simple_loss=0.06549, pruned_loss=0.009677, audio_tagging_loss=0.008046, over 15774.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08786, pruned_loss=0.01181, audio_tagging_loss=0.008531, over 3048757.40 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:57:44,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2023-11-27 04:57:50,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3737740.0, ans=0.025 2023-11-27 04:57:53,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3737806.6666666665, ans=0.0 2023-11-27 04:57:59,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3737806.6666666665, ans=0.1 2023-11-27 04:58:04,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3737873.3333333335, ans=0.0 2023-11-27 04:58:23,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-27 04:58:26,296 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7600, loss[loss=0.07957, simple_loss=0.1093, pruned_loss=0.01747, audio_tagging_loss=0.007436, over 15139.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08764, pruned_loss=0.0118, audio_tagging_loss=0.008588, over 3054965.54 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:58:45,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3738073.3333333335, ans=0.125 2023-11-27 04:58:45,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3738073.3333333335, ans=0.0 2023-11-27 04:58:48,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.653e+01 9.351e+01 1.003e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 04:58:56,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3738140.0, ans=0.0 2023-11-27 04:58:57,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.51 vs. limit=10.0 2023-11-27 04:59:05,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3738206.6666666665, ans=0.125 2023-11-27 04:59:10,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3738273.3333333335, ans=0.0 2023-11-27 04:59:18,816 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-27 04:59:22,042 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7650, loss[loss=0.06825, simple_loss=0.08944, pruned_loss=0.01341, audio_tagging_loss=0.01013, over 14563.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08793, pruned_loss=0.01197, audio_tagging_loss=0.008558, over 3051776.43 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:59:23,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3738340.0, ans=0.0 2023-11-27 05:00:02,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3738540.0, ans=0.0 2023-11-27 05:00:07,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.65 vs. limit=5.0 2023-11-27 05:00:09,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3738606.6666666665, ans=0.0 2023-11-27 05:00:13,720 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-27 05:00:17,091 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7700, loss[loss=0.06057, simple_loss=0.07954, pruned_loss=0.01077, audio_tagging_loss=0.01002, over 13923.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08788, pruned_loss=0.01196, audio_tagging_loss=0.008526, over 3055566.06 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:00:21,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3738673.3333333335, ans=0.0 2023-11-27 05:00:39,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3738806.6666666665, ans=0.0 2023-11-27 05:00:40,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.124e+01 9.752e+01 1.036e+02 1.277e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:00:47,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3738806.6666666665, ans=0.07 2023-11-27 05:01:05,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3738940.0, ans=0.1 2023-11-27 05:01:09,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3738940.0, ans=0.125 2023-11-27 05:01:09,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-27 05:01:13,596 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7750, loss[loss=0.06763, simple_loss=0.1006, pruned_loss=0.009456, audio_tagging_loss=0.007852, over 15623.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08781, pruned_loss=0.01178, audio_tagging_loss=0.008508, over 3054676.52 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:01:20,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3739006.6666666665, ans=0.0 2023-11-27 05:01:37,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3739140.0, ans=0.0 2023-11-27 05:02:05,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-27 05:02:08,483 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7800, loss[loss=0.06493, simple_loss=0.09141, pruned_loss=0.01119, audio_tagging_loss=0.008041, over 16267.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08743, pruned_loss=0.01181, audio_tagging_loss=0.008595, over 3052929.71 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:02:12,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3739340.0, ans=0.1 2023-11-27 05:02:12,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3739340.0, ans=0.125 2023-11-27 05:02:12,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3739340.0, ans=0.125 2023-11-27 05:02:27,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3739406.6666666665, ans=10.0 2023-11-27 05:02:31,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.958e+01 9.629e+01 1.040e+02 1.238e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 05:02:40,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3739473.3333333335, ans=0.0 2023-11-27 05:02:44,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3739540.0, ans=0.04949747468305833 2023-11-27 05:03:00,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-27 05:03:00,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3739606.6666666665, ans=0.125 2023-11-27 05:03:03,535 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7850, loss[loss=0.0586, simple_loss=0.07449, pruned_loss=0.00993, audio_tagging_loss=0.01142, over 15312.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08841, pruned_loss=0.01201, audio_tagging_loss=0.008517, over 3046803.24 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:03:17,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-27 05:03:41,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2023-11-27 05:03:51,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-27 05:03:54,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.65 vs. limit=10.0 2023-11-27 05:03:56,057 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-27 05:03:59,947 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7900, loss[loss=0.06606, simple_loss=0.0894, pruned_loss=0.01136, audio_tagging_loss=0.009996, over 15674.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08884, pruned_loss=0.0121, audio_tagging_loss=0.008687, over 3042475.99 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:04:01,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3740006.6666666665, ans=0.125 2023-11-27 05:04:04,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3740006.6666666665, ans=0.125 2023-11-27 05:04:15,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3740073.3333333335, ans=0.0 2023-11-27 05:04:23,046 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.150e+01 9.892e+01 1.047e+02 1.288e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-27 05:04:46,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-11-27 05:04:52,361 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-27 05:04:55,418 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7950, loss[loss=0.08051, simple_loss=0.1134, pruned_loss=0.01789, audio_tagging_loss=0.005916, over 15552.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08959, pruned_loss=0.01226, audio_tagging_loss=0.008694, over 3040226.76 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:05:01,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-27 05:05:08,730 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:05:09,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3740406.6666666665, ans=0.1 2023-11-27 05:05:47,377 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-27 05:05:51,024 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8000, loss[loss=0.08036, simple_loss=0.1118, pruned_loss=0.01574, audio_tagging_loss=0.008705, over 15024.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08919, pruned_loss=0.01221, audio_tagging_loss=0.008773, over 3033880.31 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:05:58,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3740673.3333333335, ans=0.0 2023-11-27 05:05:59,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3740673.3333333335, ans=0.125 2023-11-27 05:06:12,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3740806.6666666665, ans=0.04949747468305833 2023-11-27 05:06:14,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 8.945e+01 9.427e+01 1.017e+02 1.273e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 05:06:17,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3740806.6666666665, ans=0.0 2023-11-27 05:06:32,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3740873.3333333335, ans=0.125 2023-11-27 05:06:39,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3740940.0, ans=0.0 2023-11-27 05:06:42,832 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-27 05:06:46,432 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8050, loss[loss=0.05206, simple_loss=0.0651, pruned_loss=0.01083, audio_tagging_loss=0.008686, over 15198.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08829, pruned_loss=0.01207, audio_tagging_loss=0.00889, over 3036644.28 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:06:54,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-11-27 05:07:04,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3741073.3333333335, ans=0.05 2023-11-27 05:07:04,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3741073.3333333335, ans=0.125 2023-11-27 05:07:12,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3741140.0, ans=0.2 2023-11-27 05:07:36,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3741273.3333333335, ans=0.0 2023-11-27 05:07:39,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-27 05:07:41,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3741340.0, ans=0.125 2023-11-27 05:07:42,691 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8100, loss[loss=0.04696, simple_loss=0.05936, pruned_loss=0.008312, audio_tagging_loss=0.00897, over 16176.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08911, pruned_loss=0.01215, audio_tagging_loss=0.008804, over 3043027.57 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:08:02,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3741406.6666666665, ans=0.125 2023-11-27 05:08:05,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.986e+01 9.924e+01 1.066e+02 1.404e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 05:08:31,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-27 05:08:34,237 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-27 05:08:37,363 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8150, loss[loss=0.09351, simple_loss=0.1475, pruned_loss=0.01531, audio_tagging_loss=0.004431, over 16928.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09057, pruned_loss=0.01236, audio_tagging_loss=0.008636, over 3042525.09 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:08:52,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-27 05:09:06,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2023-11-27 05:09:18,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3741873.3333333335, ans=15.0 2023-11-27 05:09:29,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-27 05:09:32,793 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8200, loss[loss=0.05975, simple_loss=0.07517, pruned_loss=0.01073, audio_tagging_loss=0.01143, over 15555.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09087, pruned_loss=0.01241, audio_tagging_loss=0.008541, over 3039007.96 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:09:32,837 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:09:57,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 9.029e+01 9.641e+01 1.058e+02 1.267e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 05:09:57,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3742140.0, ans=0.125 2023-11-27 05:09:58,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-27 05:09:59,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3742140.0, ans=0.125 2023-11-27 05:10:25,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3742273.3333333335, ans=0.0 2023-11-27 05:10:26,195 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-27 05:10:29,347 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8250, loss[loss=0.04859, simple_loss=0.05875, pruned_loss=0.006505, audio_tagging_loss=0.01271, over 16162.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09029, pruned_loss=0.01228, audio_tagging_loss=0.008591, over 3045962.92 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:10:29,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3742340.0, ans=0.125 2023-11-27 05:10:53,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-27 05:11:21,052 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-27 05:11:24,416 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8300, loss[loss=0.05765, simple_loss=0.07878, pruned_loss=0.009282, audio_tagging_loss=0.008973, over 15106.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08982, pruned_loss=0.01216, audio_tagging_loss=0.00863, over 3042486.94 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:11:40,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=12.0 2023-11-27 05:11:49,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.979e+01 9.695e+01 1.038e+02 1.326e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 05:11:50,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2023-11-27 05:11:53,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2023-11-27 05:11:56,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-27 05:12:04,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3742873.3333333335, ans=0.0 2023-11-27 05:12:16,331 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-27 05:12:19,468 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8350, loss[loss=0.04433, simple_loss=0.05954, pruned_loss=0.004917, audio_tagging_loss=0.00964, over 14137.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08991, pruned_loss=0.01212, audio_tagging_loss=0.008546, over 3039425.02 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:12:35,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-27 05:12:46,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3743140.0, ans=0.125 2023-11-27 05:12:47,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3743140.0, ans=0.125 2023-11-27 05:13:11,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2023-11-27 05:13:13,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-27 05:13:16,289 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8400, loss[loss=0.06773, simple_loss=0.09748, pruned_loss=0.01153, audio_tagging_loss=0.007461, over 15303.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08877, pruned_loss=0.01187, audio_tagging_loss=0.008621, over 3033254.59 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:13:17,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-27 05:13:24,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-11-27 05:13:39,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.778e+01 9.390e+01 9.920e+01 1.165e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 05:14:00,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3743606.6666666665, ans=0.125 2023-11-27 05:14:01,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3743606.6666666665, ans=0.125 2023-11-27 05:14:02,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-11-27 05:14:08,327 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-27 05:14:11,419 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8450, loss[loss=0.07147, simple_loss=0.1016, pruned_loss=0.01277, audio_tagging_loss=0.007887, over 14969.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08847, pruned_loss=0.0119, audio_tagging_loss=0.00857, over 3034122.82 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:14:11,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3743673.3333333335, ans=0.125 2023-11-27 05:14:40,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3743806.6666666665, ans=0.1 2023-11-27 05:14:42,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=22.5 2023-11-27 05:15:02,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-27 05:15:03,052 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-27 05:15:06,454 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8500, loss[loss=0.08245, simple_loss=0.1135, pruned_loss=0.02046, audio_tagging_loss=0.005268, over 15737.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0895, pruned_loss=0.01209, audio_tagging_loss=0.00863, over 3039663.92 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:15:21,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3744073.3333333335, ans=0.95 2023-11-27 05:15:31,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.032e+01 9.244e+01 9.812e+01 1.039e+02 1.357e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 05:15:35,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3744140.0, ans=0.1 2023-11-27 05:15:58,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-27 05:16:03,181 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8550, loss[loss=0.07184, simple_loss=0.09934, pruned_loss=0.01319, audio_tagging_loss=0.008979, over 16988.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09053, pruned_loss=0.01234, audio_tagging_loss=0.008533, over 3046299.54 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:16:07,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=15.0 2023-11-27 05:16:16,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3744406.6666666665, ans=0.05 2023-11-27 05:16:16,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-11-27 05:16:17,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3744406.6666666665, ans=0.125 2023-11-27 05:16:31,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3744473.3333333335, ans=0.07 2023-11-27 05:16:54,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-27 05:16:57,856 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8600, loss[loss=0.05362, simple_loss=0.07301, pruned_loss=0.007391, audio_tagging_loss=0.00973, over 15400.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09038, pruned_loss=0.01224, audio_tagging_loss=0.008567, over 3048151.95 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:17:15,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3744740.0, ans=0.2 2023-11-27 05:17:22,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.987e+01 9.604e+01 1.025e+02 1.300e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 05:17:25,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3744806.6666666665, ans=0.125 2023-11-27 05:17:32,107 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:17:50,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-27 05:17:53,208 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8650, loss[loss=0.07809, simple_loss=0.1174, pruned_loss=0.01251, audio_tagging_loss=0.006858, over 15431.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09015, pruned_loss=0.01219, audio_tagging_loss=0.008632, over 3047741.31 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:18:01,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-27 05:18:06,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3745073.3333333335, ans=0.0 2023-11-27 05:18:19,886 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:18:31,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-11-27 05:18:45,600 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-27 05:18:50,064 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8700, loss[loss=0.07749, simple_loss=0.1152, pruned_loss=0.01419, audio_tagging_loss=0.005723, over 15235.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09099, pruned_loss=0.01233, audio_tagging_loss=0.008695, over 3050617.91 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:18:52,486 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:19:06,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3745406.6666666665, ans=0.125 2023-11-27 05:19:14,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 9.080e+01 9.687e+01 1.041e+02 1.884e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 05:19:26,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2023-11-27 05:19:30,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3745540.0, ans=0.0 2023-11-27 05:19:42,780 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-27 05:19:43,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3745606.6666666665, ans=0.2 2023-11-27 05:19:45,866 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8750, loss[loss=0.07982, simple_loss=0.1162, pruned_loss=0.01641, audio_tagging_loss=0.005329, over 14879.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09109, pruned_loss=0.0124, audio_tagging_loss=0.008746, over 3047818.46 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:19:59,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-27 05:20:02,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3745740.0, ans=0.1 2023-11-27 05:20:37,690 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-27 05:20:40,936 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8800, loss[loss=0.06231, simple_loss=0.08131, pruned_loss=0.01322, audio_tagging_loss=0.008437, over 16699.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09161, pruned_loss=0.01254, audio_tagging_loss=0.00883, over 3048247.61 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:20:42,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=12.0 2023-11-27 05:21:07,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.208e+01 9.853e+01 1.073e+02 1.310e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 05:21:08,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746140.0, ans=0.1 2023-11-27 05:21:19,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746206.6666666665, ans=0.1 2023-11-27 05:21:32,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-27 05:21:37,191 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8850, loss[loss=0.06672, simple_loss=0.08887, pruned_loss=0.01201, audio_tagging_loss=0.01028, over 15533.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09023, pruned_loss=0.01219, audio_tagging_loss=0.008838, over 3042943.47 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:21:47,265 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:22:10,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3746540.0, ans=0.09899494936611666 2023-11-27 05:22:28,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3746606.6666666665, ans=0.0 2023-11-27 05:22:29,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-27 05:22:33,128 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8900, loss[loss=0.04653, simple_loss=0.06128, pruned_loss=0.006416, audio_tagging_loss=0.00948, over 15599.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09, pruned_loss=0.01205, audio_tagging_loss=0.008662, over 3046019.64 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:22:33,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3746673.3333333335, ans=0.125 2023-11-27 05:22:38,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3746673.3333333335, ans=0.125 2023-11-27 05:22:42,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3746673.3333333335, ans=0.125 2023-11-27 05:22:45,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746740.0, ans=0.1 2023-11-27 05:22:59,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.144e+01 9.662e+01 1.039e+02 1.595e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 05:23:14,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3746873.3333333335, ans=0.125 2023-11-27 05:23:18,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3746940.0, ans=0.125 2023-11-27 05:23:23,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3746940.0, ans=0.125 2023-11-27 05:23:25,552 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-27 05:23:28,622 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8950, loss[loss=0.05364, simple_loss=0.0719, pruned_loss=0.006965, audio_tagging_loss=0.01073, over 14377.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09039, pruned_loss=0.01204, audio_tagging_loss=0.00855, over 3053926.80 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:23:53,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3747140.0, ans=0.2 2023-11-27 05:24:02,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-27 05:24:07,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3747206.6666666665, ans=0.0 2023-11-27 05:24:20,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-27 05:24:24,294 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9000, loss[loss=0.06128, simple_loss=0.08792, pruned_loss=0.009962, audio_tagging_loss=0.007355, over 15789.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09028, pruned_loss=0.01205, audio_tagging_loss=0.008429, over 3052990.57 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:24:24,295 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 05:24:53,958 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4751, 3.8757, 4.3598, 3.6477], device='cuda:1') 2023-11-27 05:24:56,572 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05848, simple_loss=0.05048, pruned_loss=0.005329, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 05:24:56,573 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 05:25:15,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3747406.6666666665, ans=0.0 2023-11-27 05:25:23,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.971e+01 9.071e+01 9.540e+01 1.018e+02 1.204e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 05:25:23,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3747473.3333333335, ans=0.05 2023-11-27 05:25:45,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3747606.6666666665, ans=0.125 2023-11-27 05:25:49,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-27 05:25:50,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3747606.6666666665, ans=0.2 2023-11-27 05:25:50,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-27 05:25:52,143 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9050, loss[loss=0.06475, simple_loss=0.08088, pruned_loss=0.01303, audio_tagging_loss=0.01127, over 14984.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08985, pruned_loss=0.01183, audio_tagging_loss=0.008365, over 3060362.53 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:26:02,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-27 05:26:11,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3747740.0, ans=0.07 2023-11-27 05:26:40,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3747940.0, ans=0.125 2023-11-27 05:26:44,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-27 05:26:48,191 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9100, loss[loss=0.05835, simple_loss=0.08715, pruned_loss=0.005983, audio_tagging_loss=0.00879, over 16268.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08919, pruned_loss=0.01169, audio_tagging_loss=0.008362, over 3061759.46 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:27:15,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 9.080e+01 9.612e+01 1.021e+02 1.425e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 05:27:16,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3748140.0, ans=0.125 2023-11-27 05:27:26,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3748206.6666666665, ans=0.0 2023-11-27 05:27:40,449 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-27 05:27:43,546 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9150, loss[loss=0.04853, simple_loss=0.0615, pruned_loss=0.00575, audio_tagging_loss=0.01203, over 15546.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.0891, pruned_loss=0.01168, audio_tagging_loss=0.008405, over 3051453.65 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:28:10,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3748473.3333333335, ans=0.1 2023-11-27 05:28:16,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3748540.0, ans=0.2 2023-11-27 05:28:24,128 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:28:30,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3748606.6666666665, ans=0.5 2023-11-27 05:28:35,590 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-27 05:28:39,267 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9200, loss[loss=0.08205, simple_loss=0.1169, pruned_loss=0.01681, audio_tagging_loss=0.006822, over 16351.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08958, pruned_loss=0.01191, audio_tagging_loss=0.008443, over 3051679.54 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:28:40,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-27 05:28:53,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3748740.0, ans=0.0 2023-11-27 05:29:07,164 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.097e+01 9.873e+01 1.057e+02 1.295e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:29:15,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3748873.3333333335, ans=0.125 2023-11-27 05:29:31,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-27 05:29:35,192 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9250, loss[loss=0.06385, simple_loss=0.08014, pruned_loss=0.01252, audio_tagging_loss=0.01125, over 15137.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08835, pruned_loss=0.01171, audio_tagging_loss=0.008463, over 3056366.67 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:29:35,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3749006.6666666665, ans=0.09899494936611666 2023-11-27 05:29:50,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3749073.3333333335, ans=0.0 2023-11-27 05:30:27,454 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-27 05:30:30,786 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9300, loss[loss=0.06152, simple_loss=0.08613, pruned_loss=0.01098, audio_tagging_loss=0.007478, over 15017.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08861, pruned_loss=0.01182, audio_tagging_loss=0.008429, over 3054601.76 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:30:42,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2023-11-27 05:30:58,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.060e+01 9.666e+01 1.045e+02 1.304e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 05:31:22,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-27 05:31:23,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3749606.6666666665, ans=0.1 2023-11-27 05:31:25,814 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9350, loss[loss=0.07135, simple_loss=0.09878, pruned_loss=0.0143, audio_tagging_loss=0.007661, over 16214.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08965, pruned_loss=0.01206, audio_tagging_loss=0.008378, over 3052228.42 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:32:04,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-11-27 05:32:07,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.80 vs. limit=6.0 2023-11-27 05:32:18,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-27 05:32:21,972 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9400, loss[loss=0.0545, simple_loss=0.08226, pruned_loss=0.005558, audio_tagging_loss=0.007815, over 15590.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08991, pruned_loss=0.01227, audio_tagging_loss=0.008507, over 3046270.44 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:32:30,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750006.6666666665, ans=0.1 2023-11-27 05:32:32,786 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:32:50,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.326e+01 9.875e+01 1.073e+02 1.247e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:33:03,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3750206.6666666665, ans=0.125 2023-11-27 05:33:14,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-27 05:33:16,810 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:33:17,834 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9450, loss[loss=0.0712, simple_loss=0.107, pruned_loss=0.0113, audio_tagging_loss=0.00642, over 15152.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08957, pruned_loss=0.01222, audio_tagging_loss=0.008685, over 3050801.50 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:33:28,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3750406.6666666665, ans=0.015 2023-11-27 05:33:34,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-27 05:33:44,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-27 05:33:46,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3750473.3333333335, ans=0.0 2023-11-27 05:33:47,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3750473.3333333335, ans=0.125 2023-11-27 05:33:54,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3750540.0, ans=0.125 2023-11-27 05:34:10,314 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-27 05:34:13,601 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9500, loss[loss=0.05463, simple_loss=0.07245, pruned_loss=0.01083, audio_tagging_loss=0.007578, over 12972.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08939, pruned_loss=0.01219, audio_tagging_loss=0.008752, over 3054891.95 frames. ], batch size: 52, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:34:25,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3750740.0, ans=0.2 2023-11-27 05:34:36,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3750806.6666666665, ans=0.2 2023-11-27 05:34:41,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3750806.6666666665, ans=0.1 2023-11-27 05:34:42,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.981e+01 9.517e+01 1.036e+02 1.547e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 05:34:47,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-27 05:34:52,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3750873.3333333335, ans=0.125 2023-11-27 05:34:53,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750873.3333333335, ans=0.1 2023-11-27 05:35:05,379 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-27 05:35:08,582 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9550, loss[loss=0.05503, simple_loss=0.071, pruned_loss=0.008118, audio_tagging_loss=0.01141, over 15348.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08958, pruned_loss=0.0121, audio_tagging_loss=0.008793, over 3049781.88 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:35:20,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3751073.3333333335, ans=0.02 2023-11-27 05:35:42,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3751206.6666666665, ans=0.2 2023-11-27 05:36:02,675 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-27 05:36:05,781 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9600, loss[loss=0.06067, simple_loss=0.08545, pruned_loss=0.008566, audio_tagging_loss=0.009378, over 15168.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08884, pruned_loss=0.01184, audio_tagging_loss=0.008822, over 3050711.43 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:36:15,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2023-11-27 05:36:33,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 9.083e+01 9.744e+01 1.053e+02 1.282e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 05:36:40,918 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:36:56,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3751606.6666666665, ans=0.1 2023-11-27 05:36:57,615 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-27 05:37:00,763 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9650, loss[loss=0.0725, simple_loss=0.09837, pruned_loss=0.01393, audio_tagging_loss=0.009382, over 16175.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0888, pruned_loss=0.01198, audio_tagging_loss=0.008805, over 3045420.24 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:37:07,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3751673.3333333335, ans=0.125 2023-11-27 05:37:52,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-27 05:37:56,164 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9700, loss[loss=0.05764, simple_loss=0.06808, pruned_loss=0.01324, audio_tagging_loss=0.01036, over 14845.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08826, pruned_loss=0.01203, audio_tagging_loss=0.008743, over 3046632.01 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:37:57,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3752006.6666666665, ans=0.05 2023-11-27 05:38:00,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3752006.6666666665, ans=0.95 2023-11-27 05:38:03,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-27 05:38:22,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3752140.0, ans=0.125 2023-11-27 05:38:24,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3752140.0, ans=0.04949747468305833 2023-11-27 05:38:25,190 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.968e+01 9.559e+01 1.024e+02 1.547e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 05:38:40,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3752273.3333333335, ans=0.05 2023-11-27 05:38:49,069 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-27 05:38:52,141 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9750, loss[loss=0.07975, simple_loss=0.1133, pruned_loss=0.01602, audio_tagging_loss=0.007075, over 15203.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08833, pruned_loss=0.01205, audio_tagging_loss=0.008637, over 3045484.96 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:38:52,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3752340.0, ans=0.125 2023-11-27 05:39:01,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3752406.6666666665, ans=0.0 2023-11-27 05:39:31,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-27 05:39:38,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3752606.6666666665, ans=0.1 2023-11-27 05:39:44,014 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-27 05:39:44,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3752606.6666666665, ans=0.2 2023-11-27 05:39:47,152 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9800, loss[loss=0.03941, simple_loss=0.04826, pruned_loss=0.005244, audio_tagging_loss=0.01003, over 15053.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08837, pruned_loss=0.01216, audio_tagging_loss=0.008571, over 3048600.35 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:39:55,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3752673.3333333335, ans=0.2 2023-11-27 05:39:59,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3752740.0, ans=0.125 2023-11-27 05:40:04,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3752740.0, ans=0.0 2023-11-27 05:40:11,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3752806.6666666665, ans=0.07 2023-11-27 05:40:14,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3752806.6666666665, ans=0.125 2023-11-27 05:40:17,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.828e+01 9.556e+01 1.035e+02 1.179e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 05:40:18,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3752806.6666666665, ans=0.05 2023-11-27 05:40:22,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3752873.3333333335, ans=0.0 2023-11-27 05:40:36,818 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:40:39,077 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-27 05:40:42,208 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9850, loss[loss=0.07418, simple_loss=0.0977, pruned_loss=0.01662, audio_tagging_loss=0.008711, over 15523.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08921, pruned_loss=0.01213, audio_tagging_loss=0.008498, over 3040723.33 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:40:44,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3753006.6666666665, ans=0.025 2023-11-27 05:41:04,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3753140.0, ans=0.125 2023-11-27 05:41:18,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3753206.6666666665, ans=0.125 2023-11-27 05:41:23,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3753206.6666666665, ans=0.0 2023-11-27 05:41:23,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.49 vs. limit=10.0 2023-11-27 05:41:27,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3753273.3333333335, ans=0.125 2023-11-27 05:41:34,380 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-27 05:41:34,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3753273.3333333335, ans=0.125 2023-11-27 05:41:38,619 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9900, loss[loss=0.06239, simple_loss=0.07832, pruned_loss=0.01319, audio_tagging_loss=0.01004, over 14932.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.0886, pruned_loss=0.01195, audio_tagging_loss=0.008441, over 3036462.77 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:41:59,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3753473.3333333335, ans=0.0 2023-11-27 05:42:00,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=12.0 2023-11-27 05:42:07,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.022e+01 9.450e+01 1.050e+02 2.788e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-27 05:42:12,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3753540.0, ans=0.2 2023-11-27 05:42:12,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3753540.0, ans=0.0 2023-11-27 05:42:31,030 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-27 05:42:34,149 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9950, loss[loss=0.05492, simple_loss=0.07124, pruned_loss=0.01048, audio_tagging_loss=0.008821, over 15237.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08784, pruned_loss=0.01179, audio_tagging_loss=0.008403, over 3039733.53 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:42:38,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3753673.3333333335, ans=15.0 2023-11-27 05:42:42,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3753673.3333333335, ans=0.0 2023-11-27 05:42:53,869 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:43:00,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3753806.6666666665, ans=0.125 2023-11-27 05:43:07,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3753873.3333333335, ans=0.0 2023-11-27 05:43:17,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3753940.0, ans=0.0 2023-11-27 05:43:26,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-27 05:43:27,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3753940.0, ans=0.125 2023-11-27 05:43:27,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3753940.0, ans=0.1 2023-11-27 05:43:29,297 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10000, loss[loss=0.06889, simple_loss=0.09716, pruned_loss=0.01327, audio_tagging_loss=0.007036, over 16152.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08809, pruned_loss=0.01175, audio_tagging_loss=0.008369, over 3047933.15 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:43:40,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3754073.3333333335, ans=0.0 2023-11-27 05:43:59,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.953e+01 9.636e+01 1.038e+02 1.515e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 05:44:01,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3754140.0, ans=0.1 2023-11-27 05:44:07,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3754206.6666666665, ans=0.125 2023-11-27 05:44:10,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3754206.6666666665, ans=0.0 2023-11-27 05:44:13,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-11-27 05:44:17,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3754273.3333333335, ans=0.07 2023-11-27 05:44:21,984 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-27 05:44:25,015 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10050, loss[loss=0.08116, simple_loss=0.113, pruned_loss=0.01803, audio_tagging_loss=0.006607, over 15931.00 frames. ], tot_loss[loss=0.06392, simple_loss=0.08774, pruned_loss=0.01166, audio_tagging_loss=0.008386, over 3050519.85 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:44:49,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3754473.3333333335, ans=0.2 2023-11-27 05:44:51,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3754473.3333333335, ans=0.2 2023-11-27 05:44:53,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2023-11-27 05:44:55,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3754473.3333333335, ans=0.125 2023-11-27 05:44:56,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3754473.3333333335, ans=0.05 2023-11-27 05:45:07,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3754540.0, ans=0.1 2023-11-27 05:45:08,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-27 05:45:16,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3754606.6666666665, ans=0.0 2023-11-27 05:45:18,013 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-27 05:45:21,481 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10100, loss[loss=0.06736, simple_loss=0.0973, pruned_loss=0.01202, audio_tagging_loss=0.006697, over 15788.00 frames. ], tot_loss[loss=0.06309, simple_loss=0.08649, pruned_loss=0.01134, audio_tagging_loss=0.008505, over 3045850.75 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:45:41,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3754740.0, ans=0.2 2023-11-27 05:45:51,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.906e+01 9.588e+01 1.046e+02 1.335e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 05:45:55,554 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:46:05,975 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:46:13,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-27 05:46:14,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3754940.0, ans=0.2 2023-11-27 05:46:17,012 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10150, loss[loss=0.05565, simple_loss=0.07482, pruned_loss=0.008963, audio_tagging_loss=0.00928, over 14119.00 frames. ], tot_loss[loss=0.06361, simple_loss=0.08704, pruned_loss=0.01147, audio_tagging_loss=0.008622, over 3048078.25 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:46:17,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3755006.6666666665, ans=0.2 2023-11-27 05:46:28,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3755073.3333333335, ans=0.125 2023-11-27 05:46:32,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3755073.3333333335, ans=0.5 2023-11-27 05:46:40,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3755140.0, ans=0.2 2023-11-27 05:46:42,911 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:46:52,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3755206.6666666665, ans=0.125 2023-11-27 05:47:02,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3755273.3333333335, ans=0.0 2023-11-27 05:47:09,459 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-27 05:47:10,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3755273.3333333335, ans=0.04949747468305833 2023-11-27 05:47:12,629 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10200, loss[loss=0.05974, simple_loss=0.07718, pruned_loss=0.01241, audio_tagging_loss=0.008735, over 15815.00 frames. ], tot_loss[loss=0.06358, simple_loss=0.08669, pruned_loss=0.01157, audio_tagging_loss=0.008665, over 3050213.00 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:47:22,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3755340.0, ans=0.05 2023-11-27 05:47:25,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755406.6666666665, ans=0.1 2023-11-27 05:47:26,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-11-27 05:47:29,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-27 05:47:32,875 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:47:33,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2023-11-27 05:47:34,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3755473.3333333335, ans=0.07 2023-11-27 05:47:42,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 9.089e+01 9.735e+01 1.032e+02 1.277e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 05:47:43,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755473.3333333335, ans=0.1 2023-11-27 05:48:03,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3755606.6666666665, ans=0.125 2023-11-27 05:48:05,864 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-27 05:48:08,975 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10250, loss[loss=0.0676, simple_loss=0.09256, pruned_loss=0.01167, audio_tagging_loss=0.009656, over 14198.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08809, pruned_loss=0.01176, audio_tagging_loss=0.008689, over 3056245.87 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:48:59,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3755940.0, ans=0.125 2023-11-27 05:49:00,916 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-27 05:49:04,799 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10300, loss[loss=0.06814, simple_loss=0.08575, pruned_loss=0.01324, audio_tagging_loss=0.01203, over 16324.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08924, pruned_loss=0.01206, audio_tagging_loss=0.008693, over 3056604.25 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:49:16,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3756073.3333333335, ans=0.0 2023-11-27 05:49:34,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.184e+01 9.721e+01 1.033e+02 1.459e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 05:49:47,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3756206.6666666665, ans=0.0 2023-11-27 05:49:51,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-27 05:49:52,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-27 05:49:56,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-27 05:49:56,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3756273.3333333335, ans=0.2 2023-11-27 05:50:00,361 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10350, loss[loss=0.07574, simple_loss=0.1016, pruned_loss=0.01437, audio_tagging_loss=0.01055, over 15667.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.0893, pruned_loss=0.01204, audio_tagging_loss=0.008821, over 3059711.70 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:50:02,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3756340.0, ans=0.0 2023-11-27 05:50:11,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3756406.6666666665, ans=0.09899494936611666 2023-11-27 05:50:31,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3756473.3333333335, ans=0.125 2023-11-27 05:50:38,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=22.5 2023-11-27 05:50:49,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3756606.6666666665, ans=0.07 2023-11-27 05:50:53,230 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-27 05:50:55,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3756673.3333333335, ans=0.0 2023-11-27 05:50:56,292 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10400, loss[loss=0.08014, simple_loss=0.1196, pruned_loss=0.01195, audio_tagging_loss=0.00838, over 16316.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08893, pruned_loss=0.01198, audio_tagging_loss=0.009074, over 3061614.89 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:50:56,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3756673.3333333335, ans=0.0 2023-11-27 05:51:08,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3756740.0, ans=0.125 2023-11-27 05:51:26,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.687e+01 8.939e+01 9.704e+01 1.060e+02 1.471e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 05:51:48,446 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-27 05:51:51,595 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10450, loss[loss=0.05028, simple_loss=0.063, pruned_loss=0.009804, audio_tagging_loss=0.008972, over 17016.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08978, pruned_loss=0.01211, audio_tagging_loss=0.008942, over 3061844.02 frames. ], batch size: 66, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:52:01,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3757073.3333333335, ans=0.1 2023-11-27 05:52:02,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2023-11-27 05:52:20,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3757140.0, ans=0.125 2023-11-27 05:52:27,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-27 05:52:36,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3757273.3333333335, ans=0.2 2023-11-27 05:52:44,134 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-27 05:52:47,995 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10500, loss[loss=0.06566, simple_loss=0.086, pruned_loss=0.01464, audio_tagging_loss=0.008015, over 14461.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08948, pruned_loss=0.01207, audio_tagging_loss=0.008823, over 3056455.71 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:53:05,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-27 05:53:11,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3757473.3333333335, ans=0.1 2023-11-27 05:53:17,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.057e+01 9.549e+01 1.045e+02 1.272e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:53:40,868 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-27 05:53:43,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3757673.3333333335, ans=0.125 2023-11-27 05:53:43,941 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10550, loss[loss=0.07083, simple_loss=0.1014, pruned_loss=0.01185, audio_tagging_loss=0.008281, over 15320.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08946, pruned_loss=0.01181, audio_tagging_loss=0.008669, over 3053751.80 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:53:48,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-27 05:53:53,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2023-11-27 05:53:56,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-27 05:53:58,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757740.0, ans=0.1 2023-11-27 05:54:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3757806.6666666665, ans=0.125 2023-11-27 05:54:13,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3757806.6666666665, ans=0.125 2023-11-27 05:54:20,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757873.3333333335, ans=0.1 2023-11-27 05:54:27,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3757940.0, ans=0.2 2023-11-27 05:54:28,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3757940.0, ans=0.05 2023-11-27 05:54:36,093 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-27 05:54:39,299 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10600, loss[loss=0.07618, simple_loss=0.09852, pruned_loss=0.01821, audio_tagging_loss=0.008709, over 15539.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09022, pruned_loss=0.01196, audio_tagging_loss=0.00852, over 3045688.57 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:54:44,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3758006.6666666665, ans=0.125 2023-11-27 05:54:53,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3758073.3333333335, ans=0.0 2023-11-27 05:55:10,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.995e+01 9.549e+01 1.050e+02 1.253e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:55:30,987 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-27 05:55:34,660 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10650, loss[loss=0.06103, simple_loss=0.08435, pruned_loss=0.009055, audio_tagging_loss=0.009796, over 15065.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08931, pruned_loss=0.01188, audio_tagging_loss=0.008546, over 3036066.36 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:55:38,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3758340.0, ans=0.0 2023-11-27 05:55:53,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-27 05:55:57,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3758473.3333333335, ans=0.5 2023-11-27 05:56:06,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3758473.3333333335, ans=0.2 2023-11-27 05:56:13,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3758540.0, ans=0.1 2023-11-27 05:56:23,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3758606.6666666665, ans=0.1 2023-11-27 05:56:27,490 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-27 05:56:31,178 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10700, loss[loss=0.07249, simple_loss=0.1065, pruned_loss=0.01198, audio_tagging_loss=0.007268, over 14390.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08901, pruned_loss=0.01169, audio_tagging_loss=0.008492, over 3025349.04 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:56:31,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-27 05:56:35,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-27 05:56:38,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-27 05:56:45,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3758740.0, ans=0.0 2023-11-27 05:56:45,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3758740.0, ans=0.0 2023-11-27 05:56:50,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3758740.0, ans=0.125 2023-11-27 05:56:50,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3758740.0, ans=0.0 2023-11-27 05:57:01,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.877e+01 9.430e+01 1.025e+02 1.516e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 05:57:11,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.89 vs. limit=5.0 2023-11-27 05:57:19,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3758940.0, ans=0.125 2023-11-27 05:57:22,425 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-27 05:57:25,455 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10750, loss[loss=0.06234, simple_loss=0.08935, pruned_loss=0.007978, audio_tagging_loss=0.009685, over 14883.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08915, pruned_loss=0.0116, audio_tagging_loss=0.008549, over 3028857.50 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:57:36,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3759073.3333333335, ans=0.125 2023-11-27 05:57:36,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3759073.3333333335, ans=0.125 2023-11-27 05:57:57,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2023-11-27 05:57:59,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-11-27 05:57:59,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759206.6666666665, ans=0.1 2023-11-27 05:58:17,682 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-27 05:58:20,832 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10800, loss[loss=0.04995, simple_loss=0.06705, pruned_loss=0.008106, audio_tagging_loss=0.008316, over 15161.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08924, pruned_loss=0.01159, audio_tagging_loss=0.008387, over 3032893.31 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:58:52,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.069e+01 9.752e+01 1.037e+02 1.652e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:58:56,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3759540.0, ans=0.05 2023-11-27 05:59:09,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3759606.6666666665, ans=0.0 2023-11-27 05:59:10,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2023-11-27 05:59:14,069 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-27 05:59:17,770 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10850, loss[loss=0.06309, simple_loss=0.09055, pruned_loss=0.009223, audio_tagging_loss=0.008598, over 15413.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08899, pruned_loss=0.01181, audio_tagging_loss=0.008482, over 3030132.88 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:59:44,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3759806.6666666665, ans=0.0 2023-11-27 05:59:50,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3759873.3333333335, ans=0.0 2023-11-27 05:59:54,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3759873.3333333335, ans=0.0 2023-11-27 06:00:09,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-11-27 06:00:10,332 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:00:10,385 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-27 06:00:15,718 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10900, loss[loss=0.05628, simple_loss=0.07762, pruned_loss=0.008593, audio_tagging_loss=0.008878, over 15747.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08901, pruned_loss=0.01178, audio_tagging_loss=0.008527, over 3028274.76 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:00:32,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3760073.3333333335, ans=0.125 2023-11-27 06:00:42,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3760140.0, ans=0.0 2023-11-27 06:00:47,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.933e+01 9.704e+01 1.050e+02 1.255e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:01:02,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3760273.3333333335, ans=0.125 2023-11-27 06:01:07,494 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-27 06:01:09,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2023-11-27 06:01:10,571 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10950, loss[loss=0.06464, simple_loss=0.08576, pruned_loss=0.01181, audio_tagging_loss=0.009944, over 15786.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08825, pruned_loss=0.01179, audio_tagging_loss=0.008665, over 3027959.60 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:01:26,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-27 06:01:32,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3760406.6666666665, ans=0.0 2023-11-27 06:02:03,197 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-27 06:02:06,789 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11000, loss[loss=0.06546, simple_loss=0.08982, pruned_loss=0.01274, audio_tagging_loss=0.007814, over 15107.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08785, pruned_loss=0.01167, audio_tagging_loss=0.008712, over 3033307.91 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:02:12,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760673.3333333335, ans=0.1 2023-11-27 06:02:15,273 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:02:33,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3760806.6666666665, ans=0.09899494936611666 2023-11-27 06:02:38,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.834e+01 9.840e+01 1.033e+02 1.285e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:02:41,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760873.3333333335, ans=0.1 2023-11-27 06:02:57,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3760940.0, ans=0.125 2023-11-27 06:02:59,701 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-27 06:03:02,828 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11050, loss[loss=0.06459, simple_loss=0.08133, pruned_loss=0.01219, audio_tagging_loss=0.01173, over 15210.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08772, pruned_loss=0.01176, audio_tagging_loss=0.008816, over 3032608.29 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:03:32,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2023-11-27 06:03:54,124 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-27 06:03:57,490 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11100, loss[loss=0.07134, simple_loss=0.08904, pruned_loss=0.01866, audio_tagging_loss=0.00816, over 14235.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08803, pruned_loss=0.01181, audio_tagging_loss=0.008926, over 3040361.16 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:04:03,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3761340.0, ans=0.1 2023-11-27 06:04:30,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.171e+01 9.681e+01 1.044e+02 1.229e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 06:04:38,794 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:04:49,829 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-27 06:04:52,952 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11150, loss[loss=0.05442, simple_loss=0.06435, pruned_loss=0.01231, audio_tagging_loss=0.009934, over 15361.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08845, pruned_loss=0.01186, audio_tagging_loss=0.008997, over 3044188.93 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:05:23,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3761806.6666666665, ans=0.2 2023-11-27 06:05:23,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3761806.6666666665, ans=0.09899494936611666 2023-11-27 06:05:26,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3761873.3333333335, ans=0.1 2023-11-27 06:05:35,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3761873.3333333335, ans=0.0 2023-11-27 06:05:45,968 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-27 06:05:49,610 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11200, loss[loss=0.07256, simple_loss=0.09943, pruned_loss=0.01392, audio_tagging_loss=0.008921, over 16010.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08877, pruned_loss=0.01194, audio_tagging_loss=0.008969, over 3048971.15 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:06:20,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3762140.0, ans=0.0 2023-11-27 06:06:22,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 9.248e+01 9.841e+01 1.044e+02 1.353e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:06:22,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-27 06:06:26,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3762206.6666666665, ans=0.0 2023-11-27 06:06:33,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762273.3333333335, ans=0.1 2023-11-27 06:06:33,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762273.3333333335, ans=0.1 2023-11-27 06:06:37,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3762273.3333333335, ans=0.125 2023-11-27 06:06:38,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3762273.3333333335, ans=0.125 2023-11-27 06:06:39,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-27 06:06:41,704 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-27 06:06:44,804 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11250, loss[loss=0.07597, simple_loss=0.09923, pruned_loss=0.0169, audio_tagging_loss=0.009451, over 15689.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08959, pruned_loss=0.01203, audio_tagging_loss=0.008857, over 3052156.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:06:48,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3762340.0, ans=0.0 2023-11-27 06:06:49,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3762340.0, ans=0.125 2023-11-27 06:07:07,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3762473.3333333335, ans=0.07 2023-11-27 06:07:09,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3762473.3333333335, ans=0.0 2023-11-27 06:07:14,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3762473.3333333335, ans=0.0 2023-11-27 06:07:23,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3762540.0, ans=0.125 2023-11-27 06:07:26,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2023-11-27 06:07:36,584 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-27 06:07:40,541 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11300, loss[loss=0.06019, simple_loss=0.08966, pruned_loss=0.009808, audio_tagging_loss=0.005547, over 14609.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08972, pruned_loss=0.01201, audio_tagging_loss=0.008701, over 3048039.22 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:07:41,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3762673.3333333335, ans=0.2 2023-11-27 06:07:43,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3762673.3333333335, ans=0.125 2023-11-27 06:07:54,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=12.0 2023-11-27 06:08:03,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3762806.6666666665, ans=0.125 2023-11-27 06:08:13,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 9.028e+01 9.588e+01 1.026e+02 1.216e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:08:33,494 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-27 06:08:34,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3762940.0, ans=0.125 2023-11-27 06:08:36,664 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11350, loss[loss=0.0798, simple_loss=0.1152, pruned_loss=0.01435, audio_tagging_loss=0.00787, over 16816.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08902, pruned_loss=0.01182, audio_tagging_loss=0.008638, over 3054685.22 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:08:40,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3763006.6666666665, ans=0.1 2023-11-27 06:08:47,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2023-11-27 06:08:52,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3763073.3333333335, ans=0.05 2023-11-27 06:09:00,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3763140.0, ans=0.2 2023-11-27 06:09:01,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3763140.0, ans=0.04949747468305833 2023-11-27 06:09:03,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3763140.0, ans=0.125 2023-11-27 06:09:20,187 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:09:22,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3763273.3333333335, ans=0.0 2023-11-27 06:09:28,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-27 06:09:30,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3763273.3333333335, ans=0.0 2023-11-27 06:09:31,968 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11400, loss[loss=0.05734, simple_loss=0.07738, pruned_loss=0.01091, audio_tagging_loss=0.007737, over 14890.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08952, pruned_loss=0.01194, audio_tagging_loss=0.008538, over 3048830.46 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:09:37,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3763340.0, ans=10.0 2023-11-27 06:09:40,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3763340.0, ans=0.125 2023-11-27 06:09:47,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3763406.6666666665, ans=0.1 2023-11-27 06:09:53,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3763473.3333333335, ans=0.125 2023-11-27 06:10:05,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 9.070e+01 9.706e+01 1.041e+02 1.301e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:10:10,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2023-11-27 06:10:15,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3763606.6666666665, ans=0.04949747468305833 2023-11-27 06:10:22,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3763606.6666666665, ans=0.0 2023-11-27 06:10:23,646 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-27 06:10:27,261 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11450, loss[loss=0.07351, simple_loss=0.1028, pruned_loss=0.0148, audio_tagging_loss=0.007317, over 14535.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08971, pruned_loss=0.01199, audio_tagging_loss=0.008503, over 3045431.83 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:10:31,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3763673.3333333335, ans=0.0 2023-11-27 06:10:43,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3763740.0, ans=0.0 2023-11-27 06:10:49,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3763806.6666666665, ans=0.0 2023-11-27 06:10:53,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3763806.6666666665, ans=0.2 2023-11-27 06:11:09,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3763873.3333333335, ans=0.95 2023-11-27 06:11:19,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-27 06:11:23,087 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11500, loss[loss=0.06281, simple_loss=0.09312, pruned_loss=0.008542, audio_tagging_loss=0.007714, over 15190.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08965, pruned_loss=0.01197, audio_tagging_loss=0.008502, over 3053452.97 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:11:39,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:11:46,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2023-11-27 06:11:55,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.975e+01 9.589e+01 1.045e+02 1.244e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:11:59,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3764206.6666666665, ans=0.04949747468305833 2023-11-27 06:12:00,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3764206.6666666665, ans=0.125 2023-11-27 06:12:08,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2023-11-27 06:12:14,890 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-27 06:12:16,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3764273.3333333335, ans=0.0 2023-11-27 06:12:18,037 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11550, loss[loss=0.07128, simple_loss=0.09855, pruned_loss=0.01178, audio_tagging_loss=0.01023, over 14735.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08917, pruned_loss=0.01193, audio_tagging_loss=0.00849, over 3053013.95 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 06:12:19,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3764340.0, ans=0.125 2023-11-27 06:12:24,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3764340.0, ans=0.2 2023-11-27 06:12:36,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3764406.6666666665, ans=0.0 2023-11-27 06:12:39,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3764473.3333333335, ans=0.125 2023-11-27 06:12:52,042 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:13:07,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=12.0 2023-11-27 06:13:10,526 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-27 06:13:10,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3764606.6666666665, ans=0.09899494936611666 2023-11-27 06:13:13,680 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11600, loss[loss=0.06188, simple_loss=0.08291, pruned_loss=0.008355, audio_tagging_loss=0.01207, over 15809.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08977, pruned_loss=0.01207, audio_tagging_loss=0.008543, over 3055250.48 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:13:20,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2023-11-27 06:13:29,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3764740.0, ans=0.0 2023-11-27 06:13:45,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2023-11-27 06:13:48,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.097e+01 9.719e+01 1.053e+02 1.388e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:13:58,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3764940.0, ans=0.125 2023-11-27 06:14:06,630 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-27 06:14:09,725 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11650, loss[loss=0.09421, simple_loss=0.1339, pruned_loss=0.01919, audio_tagging_loss=0.008071, over 15987.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08994, pruned_loss=0.0121, audio_tagging_loss=0.008604, over 3048888.58 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:14:12,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3765006.6666666665, ans=0.125 2023-11-27 06:14:31,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-11-27 06:14:37,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-11-27 06:14:52,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3765206.6666666665, ans=0.0 2023-11-27 06:14:56,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3765273.3333333335, ans=0.125 2023-11-27 06:15:01,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3765273.3333333335, ans=0.0 2023-11-27 06:15:02,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-27 06:15:03,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3765273.3333333335, ans=0.125 2023-11-27 06:15:04,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3765340.0, ans=0.125 2023-11-27 06:15:05,648 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11700, loss[loss=0.06869, simple_loss=0.09904, pruned_loss=0.01238, audio_tagging_loss=0.006785, over 13916.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08987, pruned_loss=0.01221, audio_tagging_loss=0.008681, over 3037760.35 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:15:09,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3765340.0, ans=0.1 2023-11-27 06:15:11,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-27 06:15:18,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3765406.6666666665, ans=0.125 2023-11-27 06:15:18,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2023-11-27 06:15:27,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3765473.3333333335, ans=0.125 2023-11-27 06:15:30,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3765473.3333333335, ans=0.0 2023-11-27 06:15:38,940 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:15:40,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.890e+01 9.642e+01 1.040e+02 1.676e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 06:15:42,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-27 06:15:48,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3765540.0, ans=0.1 2023-11-27 06:15:48,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-27 06:15:58,270 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-27 06:16:00,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3765673.3333333335, ans=0.0 2023-11-27 06:16:01,370 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11750, loss[loss=0.07168, simple_loss=0.1006, pruned_loss=0.01252, audio_tagging_loss=0.008882, over 15977.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09003, pruned_loss=0.01219, audio_tagging_loss=0.00864, over 3038603.60 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:16:03,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3765673.3333333335, ans=0.125 2023-11-27 06:16:06,280 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:16:29,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3765806.6666666665, ans=0.0 2023-11-27 06:16:54,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-27 06:16:57,160 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11800, loss[loss=0.05381, simple_loss=0.07087, pruned_loss=0.009622, audio_tagging_loss=0.008753, over 15258.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08977, pruned_loss=0.01221, audio_tagging_loss=0.008636, over 3043752.20 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:16:57,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3766006.6666666665, ans=0.125 2023-11-27 06:17:31,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 8.890e+01 9.734e+01 1.045e+02 1.276e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 06:17:34,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3766206.6666666665, ans=0.125 2023-11-27 06:17:34,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3766206.6666666665, ans=0.125 2023-11-27 06:17:43,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3766273.3333333335, ans=0.125 2023-11-27 06:17:46,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3766273.3333333335, ans=0.125 2023-11-27 06:17:49,607 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-27 06:17:52,738 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11850, loss[loss=0.08298, simple_loss=0.1106, pruned_loss=0.01928, audio_tagging_loss=0.008423, over 14244.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.0901, pruned_loss=0.01218, audio_tagging_loss=0.008728, over 3047920.55 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:17:55,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3766340.0, ans=0.0 2023-11-27 06:17:58,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3766340.0, ans=0.0 2023-11-27 06:18:14,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3766473.3333333335, ans=0.0 2023-11-27 06:18:14,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3766473.3333333335, ans=0.125 2023-11-27 06:18:20,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3766473.3333333335, ans=0.0 2023-11-27 06:18:21,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3766473.3333333335, ans=0.1 2023-11-27 06:18:44,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-27 06:18:47,389 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:18:48,180 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11900, loss[loss=0.06699, simple_loss=0.09635, pruned_loss=0.01068, audio_tagging_loss=0.008137, over 15843.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08991, pruned_loss=0.0121, audio_tagging_loss=0.008815, over 3043508.13 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:18:50,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2023-11-27 06:18:52,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3766673.3333333335, ans=0.125 2023-11-27 06:18:57,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3766673.3333333335, ans=0.125 2023-11-27 06:19:23,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.781e+01 9.449e+01 1.024e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 06:19:30,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3766873.3333333335, ans=0.125 2023-11-27 06:19:34,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3766940.0, ans=0.1 2023-11-27 06:19:41,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-27 06:19:44,895 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11950, loss[loss=0.06395, simple_loss=0.08731, pruned_loss=0.01127, audio_tagging_loss=0.009019, over 14550.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08929, pruned_loss=0.01192, audio_tagging_loss=0.008856, over 3046697.01 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:19:48,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3767006.6666666665, ans=0.125 2023-11-27 06:20:07,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2023-11-27 06:20:07,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3767140.0, ans=0.04949747468305833 2023-11-27 06:20:16,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3767206.6666666665, ans=0.0 2023-11-27 06:20:19,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-11-27 06:20:30,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3767273.3333333335, ans=0.0 2023-11-27 06:20:35,353 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-27 06:20:38,327 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 12000, loss[loss=0.07409, simple_loss=0.1063, pruned_loss=0.01326, audio_tagging_loss=0.007661, over 14617.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08899, pruned_loss=0.01196, audio_tagging_loss=0.008986, over 3054846.85 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:20:38,327 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 06:20:58,161 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3251, 4.2845, 4.4861, 4.4698], device='cuda:1') 2023-11-27 06:21:10,512 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.0578, simple_loss=0.05045, pruned_loss=0.005285, audio_tagging_loss=0.02729, over 4681554.00 frames. 2023-11-27 06:21:10,512 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 06:21:20,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3767406.6666666665, ans=0.0 2023-11-27 06:22:01,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-27 06:22:02,643 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 0, loss[loss=0.05896, simple_loss=0.05941, pruned_loss=0.00647, audio_tagging_loss=0.02279, over 15568.00 frames. ], tot_loss[loss=0.05896, simple_loss=0.05941, pruned_loss=0.00647, audio_tagging_loss=0.02279, over 15568.00 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:22:02,644 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 06:22:20,760 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1386, 2.4346, 5.0628, 2.9838], device='cuda:1') 2023-11-27 06:22:26,136 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4572, 3.7918, 3.1225, 3.7862], device='cuda:1') 2023-11-27 06:22:31,411 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9650, 3.1164, 2.9237, 3.1746, 3.3784, 2.7787, 3.4193, 2.5225], device='cuda:1') 2023-11-27 06:22:33,981 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05791, simple_loss=0.05045, pruned_loss=0.005281, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-27 06:22:33,982 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 06:22:39,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3767493.3333333335, ans=0.0 2023-11-27 06:22:43,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2023-11-27 06:22:43,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.223e+01 9.944e+01 1.084e+02 1.467e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 06:22:46,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.62 vs. limit=6.0 2023-11-27 06:22:50,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3767560.0, ans=0.125 2023-11-27 06:22:52,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3767560.0, ans=0.0 2023-11-27 06:23:00,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-27 06:23:29,997 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 50, loss[loss=0.0678, simple_loss=0.07757, pruned_loss=0.008399, audio_tagging_loss=0.02062, over 14082.00 frames. ], tot_loss[loss=0.07165, simple_loss=0.08577, pruned_loss=0.01168, audio_tagging_loss=0.01708, over 689695.91 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:23:41,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3767893.3333333335, ans=0.07 2023-11-27 06:23:55,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3767960.0, ans=0.125 2023-11-27 06:23:56,539 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-27 06:24:02,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3768026.6666666665, ans=0.0 2023-11-27 06:24:15,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.28 vs. limit=10.0 2023-11-27 06:24:24,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3768093.3333333335, ans=0.125 2023-11-27 06:24:25,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-27 06:24:26,092 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 100, loss[loss=0.05995, simple_loss=0.0717, pruned_loss=0.009201, audio_tagging_loss=0.0149, over 14739.00 frames. ], tot_loss[loss=0.07363, simple_loss=0.09021, pruned_loss=0.01238, audio_tagging_loss=0.01615, over 1217108.66 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:24:26,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3768160.0, ans=10.0 2023-11-27 06:24:29,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768160.0, ans=0.1 2023-11-27 06:24:31,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3768160.0, ans=0.125 2023-11-27 06:24:35,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.383e+01 1.012e+02 1.072e+02 1.151e+02 1.382e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-27 06:24:39,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3768226.6666666665, ans=0.5 2023-11-27 06:24:44,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3768226.6666666665, ans=0.0 2023-11-27 06:24:46,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3768226.6666666665, ans=0.125 2023-11-27 06:24:51,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768293.3333333335, ans=0.1 2023-11-27 06:24:51,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3768293.3333333335, ans=0.125 2023-11-27 06:24:53,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-27 06:24:59,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2023-11-27 06:25:02,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3768360.0, ans=0.125 2023-11-27 06:25:03,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768360.0, ans=0.1 2023-11-27 06:25:11,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3768426.6666666665, ans=0.2 2023-11-27 06:25:13,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2023-11-27 06:25:14,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768426.6666666665, ans=0.1 2023-11-27 06:25:21,924 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 150, loss[loss=0.05175, simple_loss=0.05701, pruned_loss=0.00713, audio_tagging_loss=0.01612, over 15195.00 frames. ], tot_loss[loss=0.07257, simple_loss=0.09144, pruned_loss=0.01262, audio_tagging_loss=0.01424, over 1620067.37 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:25:24,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3768493.3333333335, ans=0.125 2023-11-27 06:25:27,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3768493.3333333335, ans=0.125 2023-11-27 06:25:31,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3768560.0, ans=0.125 2023-11-27 06:25:35,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768560.0, ans=0.1 2023-11-27 06:25:48,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2023-11-27 06:25:49,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-27 06:26:05,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768760.0, ans=0.1 2023-11-27 06:26:15,998 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:26:17,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3768826.6666666665, ans=0.125 2023-11-27 06:26:18,033 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 200, loss[loss=0.06777, simple_loss=0.0806, pruned_loss=0.01543, audio_tagging_loss=0.01204, over 14737.00 frames. ], tot_loss[loss=0.07096, simple_loss=0.09157, pruned_loss=0.01257, audio_tagging_loss=0.01261, over 1937988.31 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:26:28,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 9.239e+01 9.831e+01 1.046e+02 1.283e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 06:26:44,276 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-27 06:26:46,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=22.5 2023-11-27 06:26:50,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-27 06:27:06,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3769093.3333333335, ans=0.1 2023-11-27 06:27:13,803 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 250, loss[loss=0.05371, simple_loss=0.07559, pruned_loss=0.006924, audio_tagging_loss=0.008995, over 14527.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.09122, pruned_loss=0.0124, audio_tagging_loss=0.01142, over 2182236.49 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:27:16,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3769160.0, ans=0.125 2023-11-27 06:27:25,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3769226.6666666665, ans=0.0 2023-11-27 06:27:29,193 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:27:40,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-27 06:27:44,497 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:27:47,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3769360.0, ans=0.2 2023-11-27 06:28:09,443 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 300, loss[loss=0.03884, simple_loss=0.04869, pruned_loss=0.00297, audio_tagging_loss=0.01152, over 14839.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09044, pruned_loss=0.01211, audio_tagging_loss=0.01064, over 2368946.84 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:28:20,486 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 9.040e+01 9.670e+01 1.035e+02 1.237e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 06:28:32,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2023-11-27 06:28:37,051 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-27 06:29:04,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3769826.6666666665, ans=0.125 2023-11-27 06:29:05,746 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 350, loss[loss=0.08001, simple_loss=0.1129, pruned_loss=0.01669, audio_tagging_loss=0.006892, over 15747.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09048, pruned_loss=0.01217, audio_tagging_loss=0.01002, over 2517152.09 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:29:32,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-27 06:29:38,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-27 06:29:39,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-27 06:30:01,387 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 400, loss[loss=0.0784, simple_loss=0.1133, pruned_loss=0.01422, audio_tagging_loss=0.007518, over 15597.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09027, pruned_loss=0.0121, audio_tagging_loss=0.009665, over 2636437.52 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:30:11,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.851e+01 9.492e+01 1.029e+02 1.198e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:30:27,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-27 06:30:29,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3770293.3333333335, ans=0.125 2023-11-27 06:30:44,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3770360.0, ans=0.125 2023-11-27 06:30:56,995 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 450, loss[loss=0.05733, simple_loss=0.07856, pruned_loss=0.01083, audio_tagging_loss=0.007214, over 14866.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09038, pruned_loss=0.01217, audio_tagging_loss=0.009394, over 2726356.65 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:31:09,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3770560.0, ans=0.125 2023-11-27 06:31:24,671 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-27 06:31:31,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-11-27 06:31:37,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3770693.3333333335, ans=0.1 2023-11-27 06:31:42,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3770760.0, ans=0.125 2023-11-27 06:31:45,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3770760.0, ans=0.2 2023-11-27 06:31:53,040 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 500, loss[loss=0.06028, simple_loss=0.07991, pruned_loss=0.01078, audio_tagging_loss=0.009543, over 14678.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08989, pruned_loss=0.01223, audio_tagging_loss=0.009227, over 2795530.54 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:31:53,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3770826.6666666665, ans=0.125 2023-11-27 06:31:59,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3770826.6666666665, ans=0.2 2023-11-27 06:32:05,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 9.003e+01 9.804e+01 1.033e+02 1.335e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 06:32:05,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3770893.3333333335, ans=0.1 2023-11-27 06:32:20,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-27 06:32:22,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3770960.0, ans=0.125 2023-11-27 06:32:24,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3770960.0, ans=0.125 2023-11-27 06:32:25,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2023-11-27 06:32:31,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3771026.6666666665, ans=0.125 2023-11-27 06:32:33,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3771026.6666666665, ans=0.2 2023-11-27 06:32:35,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2023-11-27 06:32:50,025 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 550, loss[loss=0.06785, simple_loss=0.09438, pruned_loss=0.01014, audio_tagging_loss=0.01052, over 15020.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08945, pruned_loss=0.01206, audio_tagging_loss=0.009101, over 2856248.77 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:32:52,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3771160.0, ans=0.2 2023-11-27 06:32:59,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2023-11-27 06:33:16,278 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-27 06:33:33,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3771426.6666666665, ans=0.0 2023-11-27 06:33:42,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3771426.6666666665, ans=0.0 2023-11-27 06:33:45,511 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 600, loss[loss=0.05256, simple_loss=0.06399, pruned_loss=0.01184, audio_tagging_loss=0.008723, over 13969.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08934, pruned_loss=0.01208, audio_tagging_loss=0.00909, over 2897055.66 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:33:50,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3771493.3333333335, ans=0.125 2023-11-27 06:33:51,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3771493.3333333335, ans=0.1 2023-11-27 06:33:56,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.995e+01 9.614e+01 1.020e+02 1.289e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:34:11,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2023-11-27 06:34:12,596 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-27 06:34:13,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3771626.6666666665, ans=0.09899494936611666 2023-11-27 06:34:14,897 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:34:30,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3771760.0, ans=0.125 2023-11-27 06:34:41,269 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 650, loss[loss=0.07641, simple_loss=0.1041, pruned_loss=0.01837, audio_tagging_loss=0.005975, over 15197.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08973, pruned_loss=0.0122, audio_tagging_loss=0.00906, over 2930020.76 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:34:53,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=12.0 2023-11-27 06:35:08,429 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-27 06:35:14,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3772026.6666666665, ans=0.125 2023-11-27 06:35:15,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3772026.6666666665, ans=0.0 2023-11-27 06:35:26,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3772093.3333333335, ans=0.125 2023-11-27 06:35:37,985 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 700, loss[loss=0.05086, simple_loss=0.06562, pruned_loss=0.009365, audio_tagging_loss=0.008691, over 15414.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08946, pruned_loss=0.01202, audio_tagging_loss=0.00889, over 2957275.68 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:35:44,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3772160.0, ans=0.0 2023-11-27 06:35:49,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.967e+01 9.614e+01 1.031e+02 1.404e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:35:50,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3772226.6666666665, ans=0.025 2023-11-27 06:35:53,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3772226.6666666665, ans=0.125 2023-11-27 06:35:56,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3772226.6666666665, ans=0.1 2023-11-27 06:36:04,679 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-27 06:36:06,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3772293.3333333335, ans=0.0 2023-11-27 06:36:28,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3772426.6666666665, ans=0.125 2023-11-27 06:36:33,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3772493.3333333335, ans=0.2 2023-11-27 06:36:33,932 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 750, loss[loss=0.08279, simple_loss=0.1089, pruned_loss=0.01674, audio_tagging_loss=0.01162, over 15996.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08984, pruned_loss=0.01199, audio_tagging_loss=0.008858, over 2981551.15 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:37:01,114 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-27 06:37:03,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3772626.6666666665, ans=0.04949747468305833 2023-11-27 06:37:14,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3772693.3333333335, ans=0.125 2023-11-27 06:37:17,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3772760.0, ans=0.0 2023-11-27 06:37:18,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3772760.0, ans=0.0 2023-11-27 06:37:29,632 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 800, loss[loss=0.1037, simple_loss=0.1491, pruned_loss=0.02349, audio_tagging_loss=0.005649, over 15827.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08976, pruned_loss=0.01191, audio_tagging_loss=0.008896, over 2999104.00 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:37:40,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.165e+01 9.801e+01 1.067e+02 1.276e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-27 06:37:51,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3772960.0, ans=0.0 2023-11-27 06:37:56,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-27 06:38:05,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3773026.6666666665, ans=0.125 2023-11-27 06:38:26,128 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 850, loss[loss=0.08063, simple_loss=0.1195, pruned_loss=0.01463, audio_tagging_loss=0.006269, over 15177.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08832, pruned_loss=0.01173, audio_tagging_loss=0.009008, over 3012375.11 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:38:32,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3773160.0, ans=0.0 2023-11-27 06:38:52,619 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-27 06:39:03,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-27 06:39:04,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-27 06:39:05,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-27 06:39:11,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3773426.6666666665, ans=0.125 2023-11-27 06:39:14,468 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.544e-03 2023-11-27 06:39:18,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3773426.6666666665, ans=0.0 2023-11-27 06:39:21,795 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 900, loss[loss=0.07298, simple_loss=0.1149, pruned_loss=0.009614, audio_tagging_loss=0.005941, over 15134.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08912, pruned_loss=0.01174, audio_tagging_loss=0.009042, over 3026024.03 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:39:29,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3773493.3333333335, ans=0.125 2023-11-27 06:39:34,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.912e+01 9.588e+01 1.041e+02 1.300e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:39:49,505 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-27 06:39:51,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3773626.6666666665, ans=15.0 2023-11-27 06:39:58,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3773693.3333333335, ans=0.125 2023-11-27 06:40:11,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3773760.0, ans=0.125 2023-11-27 06:40:18,024 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 950, loss[loss=0.05556, simple_loss=0.08586, pruned_loss=0.005796, audio_tagging_loss=0.006837, over 16261.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08918, pruned_loss=0.01176, audio_tagging_loss=0.008957, over 3028884.67 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:40:28,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3773893.3333333335, ans=0.1 2023-11-27 06:40:32,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-27 06:40:43,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3773960.0, ans=0.0 2023-11-27 06:40:44,828 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-27 06:40:46,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3773960.0, ans=0.015 2023-11-27 06:40:47,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3773960.0, ans=0.125 2023-11-27 06:41:14,672 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1000, loss[loss=0.0661, simple_loss=0.09273, pruned_loss=0.01298, audio_tagging_loss=0.006753, over 14931.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08894, pruned_loss=0.01174, audio_tagging_loss=0.008869, over 3028424.32 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:41:20,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3774160.0, ans=0.2 2023-11-27 06:41:23,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3774160.0, ans=0.1 2023-11-27 06:41:26,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 9.089e+01 9.617e+01 1.045e+02 2.026e+02, threshold=1.923e+02, percent-clipped=1.0 2023-11-27 06:41:29,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3774226.6666666665, ans=0.2 2023-11-27 06:41:37,417 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:41:37,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3774293.3333333335, ans=0.125 2023-11-27 06:41:41,193 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-27 06:41:49,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3774360.0, ans=0.0 2023-11-27 06:42:09,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3774493.3333333335, ans=0.95 2023-11-27 06:42:10,066 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1050, loss[loss=0.0685, simple_loss=0.09907, pruned_loss=0.01029, audio_tagging_loss=0.008669, over 15856.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08875, pruned_loss=0.01183, audio_tagging_loss=0.008752, over 3024227.65 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:42:20,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3774560.0, ans=0.125 2023-11-27 06:42:28,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-11-27 06:42:30,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3774560.0, ans=0.0 2023-11-27 06:42:37,054 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-27 06:42:42,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3774693.3333333335, ans=0.05 2023-11-27 06:42:45,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-27 06:43:00,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3774760.0, ans=0.0 2023-11-27 06:43:00,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3774760.0, ans=0.125 2023-11-27 06:43:06,158 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1100, loss[loss=0.08396, simple_loss=0.1073, pruned_loss=0.02357, audio_tagging_loss=0.006756, over 15617.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08857, pruned_loss=0.01189, audio_tagging_loss=0.008703, over 3027065.05 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:43:07,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3774826.6666666665, ans=0.05 2023-11-27 06:43:08,304 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:43:18,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.812e+01 9.589e+01 1.029e+02 1.833e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:43:25,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3774893.3333333335, ans=0.125 2023-11-27 06:43:30,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3774960.0, ans=0.125 2023-11-27 06:43:33,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-27 06:44:02,254 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1150, loss[loss=0.07057, simple_loss=0.09019, pruned_loss=0.01697, audio_tagging_loss=0.008503, over 14573.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08776, pruned_loss=0.01182, audio_tagging_loss=0.0087, over 3033645.76 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:44:03,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3775160.0, ans=0.0 2023-11-27 06:44:09,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3775160.0, ans=0.0 2023-11-27 06:44:28,492 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-27 06:44:57,793 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1200, loss[loss=0.08422, simple_loss=0.1204, pruned_loss=0.01777, audio_tagging_loss=0.006229, over 15288.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08853, pruned_loss=0.01196, audio_tagging_loss=0.008598, over 3042042.73 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:45:08,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2023-11-27 06:45:09,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 8.861e+01 9.720e+01 1.059e+02 1.366e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:45:24,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3775626.6666666665, ans=0.125 2023-11-27 06:45:24,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-27 06:45:32,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3775693.3333333335, ans=0.015 2023-11-27 06:45:37,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3775693.3333333335, ans=0.125 2023-11-27 06:45:39,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3775693.3333333335, ans=0.1 2023-11-27 06:45:41,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-27 06:45:43,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3775760.0, ans=0.0 2023-11-27 06:45:50,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.50 vs. limit=10.0 2023-11-27 06:45:53,280 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1250, loss[loss=0.07618, simple_loss=0.1059, pruned_loss=0.01375, audio_tagging_loss=0.009494, over 14942.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08933, pruned_loss=0.01217, audio_tagging_loss=0.008619, over 3044914.87 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:46:03,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3775826.6666666665, ans=0.0 2023-11-27 06:46:21,015 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-27 06:46:21,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3775960.0, ans=0.125 2023-11-27 06:46:23,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3775960.0, ans=0.125 2023-11-27 06:46:23,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3775960.0, ans=0.125 2023-11-27 06:46:31,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-27 06:46:34,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3776026.6666666665, ans=0.125 2023-11-27 06:46:37,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3776093.3333333335, ans=0.2 2023-11-27 06:46:41,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3776093.3333333335, ans=0.125 2023-11-27 06:46:50,497 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1300, loss[loss=0.0615, simple_loss=0.0857, pruned_loss=0.01025, audio_tagging_loss=0.008399, over 15653.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08965, pruned_loss=0.0122, audio_tagging_loss=0.008593, over 3042037.33 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:46:50,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3776160.0, ans=0.125 2023-11-27 06:46:57,700 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:47:02,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.746e+01 9.407e+01 9.901e+01 1.217e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 06:47:10,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3776226.6666666665, ans=0.125 2023-11-27 06:47:16,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-27 06:47:33,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-11-27 06:47:40,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-27 06:47:46,222 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1350, loss[loss=0.06864, simple_loss=0.09437, pruned_loss=0.01373, audio_tagging_loss=0.007724, over 15008.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08796, pruned_loss=0.0119, audio_tagging_loss=0.008601, over 3040601.87 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:47:57,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776560.0, ans=0.1 2023-11-27 06:48:12,907 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-27 06:48:18,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2023-11-27 06:48:26,732 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:48:36,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3776760.0, ans=0.1 2023-11-27 06:48:36,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2023-11-27 06:48:37,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3776760.0, ans=0.0 2023-11-27 06:48:41,676 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1400, loss[loss=0.06512, simple_loss=0.1005, pruned_loss=0.008869, audio_tagging_loss=0.006016, over 15035.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08796, pruned_loss=0.01189, audio_tagging_loss=0.008691, over 3039695.54 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:48:51,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776826.6666666665, ans=0.1 2023-11-27 06:48:55,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.911e+01 9.491e+01 1.013e+02 1.381e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:49:09,733 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-27 06:49:14,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3776960.0, ans=0.0 2023-11-27 06:49:14,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3776960.0, ans=0.1 2023-11-27 06:49:19,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-27 06:49:38,318 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1450, loss[loss=0.06619, simple_loss=0.08995, pruned_loss=0.01165, audio_tagging_loss=0.009558, over 16354.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08856, pruned_loss=0.0119, audio_tagging_loss=0.008628, over 3044113.14 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:49:52,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3777226.6666666665, ans=0.2 2023-11-27 06:49:58,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3777226.6666666665, ans=0.0 2023-11-27 06:50:01,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3777293.3333333335, ans=0.125 2023-11-27 06:50:05,190 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-27 06:50:20,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-11-27 06:50:25,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.19 vs. limit=22.5 2023-11-27 06:50:34,467 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1500, loss[loss=0.04653, simple_loss=0.05538, pruned_loss=0.008714, audio_tagging_loss=0.01012, over 14408.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08909, pruned_loss=0.012, audio_tagging_loss=0.008625, over 3045262.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:50:47,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.689e+01 9.111e+01 9.628e+01 1.038e+02 1.478e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 06:51:00,526 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-27 06:51:01,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3777626.6666666665, ans=0.125 2023-11-27 06:51:18,345 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:51:19,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3777760.0, ans=10.0 2023-11-27 06:51:27,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-27 06:51:29,862 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1550, loss[loss=0.05139, simple_loss=0.0602, pruned_loss=0.00966, audio_tagging_loss=0.01163, over 14952.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08888, pruned_loss=0.01208, audio_tagging_loss=0.00872, over 3046491.01 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:51:32,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-27 06:51:38,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-27 06:51:57,632 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-27 06:52:00,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3777960.0, ans=0.125 2023-11-27 06:52:07,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3778026.6666666665, ans=0.2 2023-11-27 06:52:25,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3778160.0, ans=0.0 2023-11-27 06:52:26,216 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1600, loss[loss=0.0761, simple_loss=0.1069, pruned_loss=0.01482, audio_tagging_loss=0.007854, over 16194.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08885, pruned_loss=0.01195, audio_tagging_loss=0.008839, over 3049856.18 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:52:26,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778160.0, ans=0.1 2023-11-27 06:52:40,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3778226.6666666665, ans=0.125 2023-11-27 06:52:40,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 9.269e+01 9.916e+01 1.064e+02 1.389e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-27 06:52:49,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3778293.3333333335, ans=0.0 2023-11-27 06:52:53,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-27 06:52:56,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-11-27 06:52:57,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3778293.3333333335, ans=0.2 2023-11-27 06:53:23,718 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1650, loss[loss=0.05868, simple_loss=0.0768, pruned_loss=0.009802, audio_tagging_loss=0.01048, over 14990.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08829, pruned_loss=0.01179, audio_tagging_loss=0.008963, over 3048640.05 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:53:25,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3778493.3333333335, ans=0.0 2023-11-27 06:53:28,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3778493.3333333335, ans=0.2 2023-11-27 06:53:44,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3778626.6666666665, ans=0.125 2023-11-27 06:53:50,087 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-27 06:54:05,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3778693.3333333335, ans=0.2 2023-11-27 06:54:05,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-27 06:54:19,591 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1700, loss[loss=0.05655, simple_loss=0.07407, pruned_loss=0.007773, audio_tagging_loss=0.01174, over 15842.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08942, pruned_loss=0.01192, audio_tagging_loss=0.008874, over 3051689.91 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:54:27,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778826.6666666665, ans=0.1 2023-11-27 06:54:34,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.972e+01 9.443e+01 1.035e+02 1.327e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 06:54:34,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3778893.3333333335, ans=0.125 2023-11-27 06:54:39,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3778893.3333333335, ans=0.04949747468305833 2023-11-27 06:54:41,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3778960.0, ans=0.125 2023-11-27 06:54:46,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778960.0, ans=0.1 2023-11-27 06:54:47,277 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-27 06:54:54,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3779026.6666666665, ans=0.05 2023-11-27 06:55:06,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3779093.3333333335, ans=0.07 2023-11-27 06:55:08,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=15.0 2023-11-27 06:55:15,410 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1750, loss[loss=0.04776, simple_loss=0.0614, pruned_loss=0.007944, audio_tagging_loss=0.009117, over 14134.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08965, pruned_loss=0.01193, audio_tagging_loss=0.008808, over 3044165.08 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:55:17,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3779160.0, ans=0.0 2023-11-27 06:55:33,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3779226.6666666665, ans=0.0 2023-11-27 06:55:42,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-27 06:55:42,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3779293.3333333335, ans=0.125 2023-11-27 06:55:58,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-27 06:56:12,160 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1800, loss[loss=0.04831, simple_loss=0.07247, pruned_loss=0.005301, audio_tagging_loss=0.006781, over 13607.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08926, pruned_loss=0.01181, audio_tagging_loss=0.00866, over 3049304.39 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:56:21,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3779493.3333333335, ans=0.125 2023-11-27 06:56:22,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3779560.0, ans=0.125 2023-11-27 06:56:26,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.045e+01 9.662e+01 1.034e+02 1.361e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 06:56:27,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3779560.0, ans=0.125 2023-11-27 06:56:38,860 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-27 06:56:51,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3779693.3333333335, ans=0.0 2023-11-27 06:57:07,961 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1850, loss[loss=0.06008, simple_loss=0.0785, pruned_loss=0.01098, audio_tagging_loss=0.009851, over 14166.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08918, pruned_loss=0.0119, audio_tagging_loss=0.008656, over 3054435.45 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:57:34,692 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-27 06:57:35,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3779960.0, ans=0.07 2023-11-27 06:57:52,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3780093.3333333335, ans=0.125 2023-11-27 06:57:53,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2023-11-27 06:58:04,544 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1900, loss[loss=0.07274, simple_loss=0.09874, pruned_loss=0.01558, audio_tagging_loss=0.007787, over 14869.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08865, pruned_loss=0.01184, audio_tagging_loss=0.00859, over 3042955.12 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:58:19,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.174e+01 9.812e+01 1.054e+02 1.527e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 06:58:31,866 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-27 06:58:35,218 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:58:58,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2023-11-27 06:59:00,544 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1950, loss[loss=0.06322, simple_loss=0.09345, pruned_loss=0.007193, audio_tagging_loss=0.009306, over 14603.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08802, pruned_loss=0.01171, audio_tagging_loss=0.008608, over 3039383.09 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:59:02,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3780493.3333333335, ans=0.125 2023-11-27 06:59:05,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3780493.3333333335, ans=0.125 2023-11-27 06:59:24,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3780626.6666666665, ans=0.025 2023-11-27 06:59:27,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-27 06:59:57,510 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2000, loss[loss=0.06123, simple_loss=0.07971, pruned_loss=0.01109, audio_tagging_loss=0.01028, over 15010.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08718, pruned_loss=0.01182, audio_tagging_loss=0.008739, over 3031410.03 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:00:11,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.474e+01 9.102e+01 9.766e+01 1.042e+02 1.467e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 07:00:13,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3780893.3333333335, ans=0.125 2023-11-27 07:00:14,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3780893.3333333335, ans=0.125 2023-11-27 07:00:23,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-11-27 07:00:24,177 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-27 07:00:25,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3780960.0, ans=0.0 2023-11-27 07:00:39,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3781026.6666666665, ans=0.0 2023-11-27 07:00:39,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3781026.6666666665, ans=0.125 2023-11-27 07:00:46,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2023-11-27 07:00:52,953 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2050, loss[loss=0.06953, simple_loss=0.1027, pruned_loss=0.01236, audio_tagging_loss=0.005807, over 14873.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08718, pruned_loss=0.01171, audio_tagging_loss=0.008695, over 3032511.49 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:01:19,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2023-11-27 07:01:20,007 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-27 07:01:20,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3781293.3333333335, ans=0.2 2023-11-27 07:01:20,197 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:01:21,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.77 vs. limit=12.0 2023-11-27 07:01:25,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3781293.3333333335, ans=0.0 2023-11-27 07:01:42,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2023-11-27 07:01:42,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-27 07:01:47,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3781426.6666666665, ans=0.125 2023-11-27 07:01:49,460 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2100, loss[loss=0.0491, simple_loss=0.05898, pruned_loss=0.009008, audio_tagging_loss=0.0106, over 13709.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08806, pruned_loss=0.01187, audio_tagging_loss=0.00863, over 3037457.96 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:01:55,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3781493.3333333335, ans=0.125 2023-11-27 07:02:04,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.912e+01 9.473e+01 1.026e+02 1.468e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:02:14,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3781626.6666666665, ans=0.0 2023-11-27 07:02:16,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-27 07:02:40,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3781760.0, ans=0.0 2023-11-27 07:02:40,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3781760.0, ans=0.125 2023-11-27 07:02:43,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-27 07:02:45,483 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2150, loss[loss=0.08209, simple_loss=0.1182, pruned_loss=0.01523, audio_tagging_loss=0.007753, over 15439.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08805, pruned_loss=0.0118, audio_tagging_loss=0.008658, over 3036768.23 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:03:08,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=22.5 2023-11-27 07:03:11,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3781960.0, ans=0.125 2023-11-27 07:03:12,862 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-27 07:03:19,667 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:03:19,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3782026.6666666665, ans=0.0 2023-11-27 07:03:20,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3782026.6666666665, ans=0.125 2023-11-27 07:03:27,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 07:03:29,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-27 07:03:33,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3782093.3333333335, ans=0.5 2023-11-27 07:03:41,248 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2200, loss[loss=0.06028, simple_loss=0.08603, pruned_loss=0.0119, audio_tagging_loss=0.005368, over 15135.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08853, pruned_loss=0.0119, audio_tagging_loss=0.008631, over 3038769.89 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:03:46,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3782160.0, ans=0.2 2023-11-27 07:03:47,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3782160.0, ans=0.035 2023-11-27 07:03:57,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.960e+01 9.671e+01 1.061e+02 1.263e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 07:04:06,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3782293.3333333335, ans=0.1 2023-11-27 07:04:08,509 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-27 07:04:12,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3782293.3333333335, ans=0.2 2023-11-27 07:04:23,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2023-11-27 07:04:37,765 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2250, loss[loss=0.06502, simple_loss=0.09255, pruned_loss=0.01204, audio_tagging_loss=0.006706, over 15013.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08819, pruned_loss=0.01178, audio_tagging_loss=0.00866, over 3038755.82 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:04:42,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3782493.3333333335, ans=0.125 2023-11-27 07:04:42,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3782493.3333333335, ans=0.0 2023-11-27 07:05:04,595 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-27 07:05:06,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3782626.6666666665, ans=0.0 2023-11-27 07:05:16,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3782693.3333333335, ans=0.0 2023-11-27 07:05:17,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3782693.3333333335, ans=0.125 2023-11-27 07:05:29,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3782760.0, ans=0.025 2023-11-27 07:05:34,181 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2300, loss[loss=0.06219, simple_loss=0.08283, pruned_loss=0.01335, audio_tagging_loss=0.007421, over 14214.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08829, pruned_loss=0.01181, audio_tagging_loss=0.008799, over 3044399.77 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:05:34,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3782826.6666666665, ans=0.125 2023-11-27 07:05:44,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3782893.3333333335, ans=0.2 2023-11-27 07:05:49,435 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.875e+01 9.360e+01 1.027e+02 1.398e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 07:06:01,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-27 07:06:13,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3783026.6666666665, ans=0.0 2023-11-27 07:06:23,005 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:06:29,258 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2350, loss[loss=0.09381, simple_loss=0.1297, pruned_loss=0.02237, audio_tagging_loss=0.006583, over 16536.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08809, pruned_loss=0.01178, audio_tagging_loss=0.008833, over 3041625.28 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:06:34,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2023-11-27 07:06:43,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3783226.6666666665, ans=0.2 2023-11-27 07:06:52,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3783293.3333333335, ans=0.125 2023-11-27 07:06:54,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3783293.3333333335, ans=0.125 2023-11-27 07:06:56,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3783293.3333333335, ans=0.125 2023-11-27 07:06:57,349 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-27 07:07:26,527 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2400, loss[loss=0.07411, simple_loss=0.09257, pruned_loss=0.01579, audio_tagging_loss=0.01204, over 16159.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08819, pruned_loss=0.0118, audio_tagging_loss=0.008871, over 3037773.87 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:07:35,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-27 07:07:42,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.975e+01 9.647e+01 1.056e+02 1.487e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 07:07:42,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-27 07:07:44,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3783560.0, ans=0.125 2023-11-27 07:07:52,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-27 07:07:56,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3783626.6666666665, ans=0.0 2023-11-27 07:08:22,608 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2450, loss[loss=0.07556, simple_loss=0.1046, pruned_loss=0.01518, audio_tagging_loss=0.008071, over 15060.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08842, pruned_loss=0.01182, audio_tagging_loss=0.008939, over 3034427.86 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:08:39,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-27 07:08:49,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-27 07:09:18,672 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2500, loss[loss=0.06002, simple_loss=0.08122, pruned_loss=0.01016, audio_tagging_loss=0.009244, over 14301.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08783, pruned_loss=0.01178, audio_tagging_loss=0.008944, over 3035691.48 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:09:24,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-27 07:09:36,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3784226.6666666665, ans=0.035 2023-11-27 07:09:37,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.971e+01 9.601e+01 1.047e+02 1.603e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 07:09:37,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3784226.6666666665, ans=0.0 2023-11-27 07:09:45,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3784293.3333333335, ans=0.125 2023-11-27 07:09:46,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-27 07:09:51,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3784293.3333333335, ans=0.1 2023-11-27 07:10:16,026 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2550, loss[loss=0.05622, simple_loss=0.07932, pruned_loss=0.006783, audio_tagging_loss=0.009776, over 15430.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08893, pruned_loss=0.01206, audio_tagging_loss=0.008776, over 3029523.66 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:10:22,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3784493.3333333335, ans=0.125 2023-11-27 07:10:33,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3784560.0, ans=0.0 2023-11-27 07:10:42,460 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-27 07:10:53,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3784693.3333333335, ans=0.125 2023-11-27 07:11:05,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3784760.0, ans=0.0 2023-11-27 07:11:12,326 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2600, loss[loss=0.04629, simple_loss=0.06593, pruned_loss=0.004231, audio_tagging_loss=0.0091, over 15994.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08895, pruned_loss=0.01197, audio_tagging_loss=0.008634, over 3033962.32 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:11:16,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-27 07:11:29,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.922e+01 9.501e+01 1.026e+02 1.288e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 07:11:30,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-27 07:11:35,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2023-11-27 07:11:38,610 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-27 07:12:02,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3785093.3333333335, ans=0.125 2023-11-27 07:12:07,905 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2650, loss[loss=0.07491, simple_loss=0.1055, pruned_loss=0.01581, audio_tagging_loss=0.006366, over 14997.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.0887, pruned_loss=0.01176, audio_tagging_loss=0.008545, over 3032347.70 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:12:13,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3785160.0, ans=0.125 2023-11-27 07:12:15,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3785160.0, ans=0.0 2023-11-27 07:12:27,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3785226.6666666665, ans=0.1 2023-11-27 07:12:31,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3785293.3333333335, ans=0.0 2023-11-27 07:12:35,620 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-27 07:12:56,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3785426.6666666665, ans=0.0 2023-11-27 07:13:03,849 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2700, loss[loss=0.0653, simple_loss=0.08282, pruned_loss=0.01324, audio_tagging_loss=0.01065, over 16038.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08859, pruned_loss=0.01182, audio_tagging_loss=0.008556, over 3036624.59 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:13:12,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3785493.3333333335, ans=0.0 2023-11-27 07:13:17,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3785560.0, ans=0.1 2023-11-27 07:13:18,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-27 07:13:19,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3785560.0, ans=10.0 2023-11-27 07:13:22,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.127e+01 9.631e+01 1.043e+02 1.339e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:13:31,264 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-27 07:13:35,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3785626.6666666665, ans=0.04949747468305833 2023-11-27 07:13:47,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3785760.0, ans=0.1 2023-11-27 07:13:49,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-27 07:13:55,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3785760.0, ans=0.09899494936611666 2023-11-27 07:14:00,774 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2750, loss[loss=0.06983, simple_loss=0.1045, pruned_loss=0.01187, audio_tagging_loss=0.00571, over 15054.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08852, pruned_loss=0.01189, audio_tagging_loss=0.008565, over 3035530.42 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:14:06,172 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:14:07,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3785826.6666666665, ans=0.1 2023-11-27 07:14:26,688 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-27 07:14:38,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3786026.6666666665, ans=0.0 2023-11-27 07:14:48,513 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:14:56,111 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2800, loss[loss=0.07289, simple_loss=0.1058, pruned_loss=0.01268, audio_tagging_loss=0.007328, over 16010.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08788, pruned_loss=0.01173, audio_tagging_loss=0.008509, over 3039202.04 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:15:07,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3786226.6666666665, ans=0.1 2023-11-27 07:15:08,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3786226.6666666665, ans=0.0 2023-11-27 07:15:10,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3786226.6666666665, ans=0.125 2023-11-27 07:15:14,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 9.123e+01 9.680e+01 1.057e+02 2.633e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-27 07:15:24,234 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-27 07:15:24,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-11-27 07:15:37,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3786360.0, ans=0.125 2023-11-27 07:15:40,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2023-11-27 07:15:52,495 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2850, loss[loss=0.06497, simple_loss=0.09487, pruned_loss=0.01054, audio_tagging_loss=0.007, over 16171.00 frames. ], tot_loss[loss=0.06369, simple_loss=0.0874, pruned_loss=0.0115, audio_tagging_loss=0.008491, over 3040262.50 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:16:02,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3786560.0, ans=0.125 2023-11-27 07:16:19,116 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-27 07:16:33,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3786693.3333333335, ans=0.0 2023-11-27 07:16:50,525 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2900, loss[loss=0.07117, simple_loss=0.1018, pruned_loss=0.01186, audio_tagging_loss=0.008399, over 16262.00 frames. ], tot_loss[loss=0.06407, simple_loss=0.08788, pruned_loss=0.01164, audio_tagging_loss=0.00849, over 3043544.13 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:17:03,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3786893.3333333335, ans=0.125 2023-11-27 07:17:04,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2023-11-27 07:17:07,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.964e+01 9.454e+01 1.013e+02 1.177e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 07:17:07,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3786893.3333333335, ans=0.125 2023-11-27 07:17:16,013 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-27 07:17:26,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3787026.6666666665, ans=0.125 2023-11-27 07:17:43,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3787093.3333333335, ans=0.125 2023-11-27 07:17:45,198 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2950, loss[loss=0.06019, simple_loss=0.08159, pruned_loss=0.009688, audio_tagging_loss=0.009711, over 14128.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08923, pruned_loss=0.01194, audio_tagging_loss=0.008519, over 3046085.13 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:17:53,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3787160.0, ans=0.125 2023-11-27 07:18:11,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-27 07:18:14,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3787293.3333333335, ans=0.0 2023-11-27 07:18:14,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3787293.3333333335, ans=0.125 2023-11-27 07:18:26,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3787360.0, ans=15.0 2023-11-27 07:18:40,626 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3000, loss[loss=0.0603, simple_loss=0.08565, pruned_loss=0.00919, audio_tagging_loss=0.008289, over 15377.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0895, pruned_loss=0.01206, audio_tagging_loss=0.008517, over 3049381.53 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:18:40,627 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 07:19:13,013 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05781, simple_loss=0.05047, pruned_loss=0.005352, audio_tagging_loss=0.02722, over 4681554.00 frames. 2023-11-27 07:19:13,014 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 07:19:31,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.941e+01 8.979e+01 9.616e+01 1.040e+02 1.231e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 07:19:39,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-27 07:19:46,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3787693.3333333335, ans=0.125 2023-11-27 07:20:08,446 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3050, loss[loss=0.06795, simple_loss=0.08928, pruned_loss=0.0131, audio_tagging_loss=0.01022, over 14750.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08922, pruned_loss=0.01201, audio_tagging_loss=0.008572, over 3055244.26 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:20:18,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3787893.3333333335, ans=0.125 2023-11-27 07:20:19,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3787893.3333333335, ans=0.1 2023-11-27 07:20:35,650 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-27 07:20:38,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3787960.0, ans=22.5 2023-11-27 07:20:41,567 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:20:47,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3788026.6666666665, ans=0.025 2023-11-27 07:20:55,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3788093.3333333335, ans=0.0 2023-11-27 07:20:58,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3788093.3333333335, ans=0.1 2023-11-27 07:21:03,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2023-11-27 07:21:04,513 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3100, loss[loss=0.06592, simple_loss=0.09979, pruned_loss=0.00936, audio_tagging_loss=0.00666, over 14951.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.088, pruned_loss=0.01162, audio_tagging_loss=0.008656, over 3051212.29 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:21:14,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3788226.6666666665, ans=0.0 2023-11-27 07:21:16,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-27 07:21:24,097 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 9.299e+01 9.781e+01 1.036e+02 1.255e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-27 07:21:27,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3788293.3333333335, ans=0.125 2023-11-27 07:21:29,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-11-27 07:21:31,530 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-27 07:22:00,761 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3150, loss[loss=0.07021, simple_loss=0.09626, pruned_loss=0.01223, audio_tagging_loss=0.009853, over 15168.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08804, pruned_loss=0.01149, audio_tagging_loss=0.008808, over 3048941.07 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:22:17,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3788560.0, ans=0.125 2023-11-27 07:22:23,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3788626.6666666665, ans=0.125 2023-11-27 07:22:26,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.28 vs. limit=10.0 2023-11-27 07:22:26,895 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-27 07:22:55,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3788826.6666666665, ans=0.0 2023-11-27 07:22:56,761 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3200, loss[loss=0.0564, simple_loss=0.07707, pruned_loss=0.007694, audio_tagging_loss=0.01017, over 15747.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08769, pruned_loss=0.01148, audio_tagging_loss=0.008874, over 3039909.01 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:23:13,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3788893.3333333335, ans=0.1 2023-11-27 07:23:15,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 9.115e+01 9.612e+01 1.036e+02 1.415e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 07:23:18,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3788960.0, ans=0.0 2023-11-27 07:23:19,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-11-27 07:23:19,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-27 07:23:22,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3788960.0, ans=0.05 2023-11-27 07:23:23,219 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-27 07:23:25,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-27 07:23:31,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3789026.6666666665, ans=0.0 2023-11-27 07:23:39,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-11-27 07:23:51,846 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3250, loss[loss=0.06918, simple_loss=0.0921, pruned_loss=0.01369, audio_tagging_loss=0.009441, over 15438.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08747, pruned_loss=0.01158, audio_tagging_loss=0.008919, over 3042709.11 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:24:06,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3789226.6666666665, ans=0.1 2023-11-27 07:24:19,676 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-27 07:24:30,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-11-27 07:24:48,705 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3300, loss[loss=0.05916, simple_loss=0.06971, pruned_loss=0.01353, audio_tagging_loss=0.01078, over 15313.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08728, pruned_loss=0.01159, audio_tagging_loss=0.008964, over 3035899.74 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:25:07,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 9.209e+01 1.006e+02 1.091e+02 1.432e+02, threshold=2.012e+02, percent-clipped=0.0 2023-11-27 07:25:15,512 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-27 07:25:28,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3789693.3333333335, ans=0.0 2023-11-27 07:25:44,969 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3350, loss[loss=0.07763, simple_loss=0.1126, pruned_loss=0.01551, audio_tagging_loss=0.005816, over 15424.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08743, pruned_loss=0.01164, audio_tagging_loss=0.008873, over 3031812.72 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:25:48,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-27 07:26:10,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3789960.0, ans=10.0 2023-11-27 07:26:12,275 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-27 07:26:19,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790026.6666666665, ans=0.1 2023-11-27 07:26:35,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3790093.3333333335, ans=0.95 2023-11-27 07:26:40,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3790160.0, ans=0.125 2023-11-27 07:26:40,917 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3400, loss[loss=0.06992, simple_loss=0.09477, pruned_loss=0.01056, audio_tagging_loss=0.01197, over 14723.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08759, pruned_loss=0.01161, audio_tagging_loss=0.008753, over 3034901.66 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:27:00,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 9.061e+01 9.680e+01 1.035e+02 1.293e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 07:27:07,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3790293.3333333335, ans=0.2 2023-11-27 07:27:08,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-27 07:27:12,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790293.3333333335, ans=0.1 2023-11-27 07:27:27,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3790426.6666666665, ans=0.125 2023-11-27 07:27:31,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2023-11-27 07:27:33,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3790426.6666666665, ans=0.125 2023-11-27 07:27:37,473 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3450, loss[loss=0.04325, simple_loss=0.05334, pruned_loss=0.005656, audio_tagging_loss=0.01093, over 15916.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08821, pruned_loss=0.01165, audio_tagging_loss=0.008695, over 3043250.83 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:27:44,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-27 07:27:48,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3790560.0, ans=0.125 2023-11-27 07:27:55,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3790560.0, ans=0.2 2023-11-27 07:28:00,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3790626.6666666665, ans=0.125 2023-11-27 07:28:04,052 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-27 07:28:11,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3790693.3333333335, ans=0.125 2023-11-27 07:28:22,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790760.0, ans=0.1 2023-11-27 07:28:33,478 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3500, loss[loss=0.07406, simple_loss=0.1087, pruned_loss=0.01489, audio_tagging_loss=0.004828, over 15383.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08787, pruned_loss=0.01167, audio_tagging_loss=0.008567, over 3039928.43 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:28:45,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3790893.3333333335, ans=0.0 2023-11-27 07:28:52,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.044e+01 9.629e+01 1.032e+02 1.263e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:29:00,469 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-27 07:29:02,548 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:29:18,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3791093.3333333335, ans=0.125 2023-11-27 07:29:29,118 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3550, loss[loss=0.08018, simple_loss=0.1063, pruned_loss=0.01915, audio_tagging_loss=0.00788, over 14655.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08802, pruned_loss=0.01178, audio_tagging_loss=0.008542, over 3036545.68 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:29:29,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-27 07:29:52,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-27 07:29:55,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.50 vs. limit=22.5 2023-11-27 07:29:56,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-27 07:30:25,372 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3600, loss[loss=0.06194, simple_loss=0.08635, pruned_loss=0.01006, audio_tagging_loss=0.008704, over 14688.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08807, pruned_loss=0.01185, audio_tagging_loss=0.008564, over 3041264.55 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:30:34,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-27 07:30:43,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.813e+01 9.473e+01 1.021e+02 1.241e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:30:51,175 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-27 07:30:54,562 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:31:19,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3791760.0, ans=0.0 2023-11-27 07:31:20,980 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3650, loss[loss=0.08154, simple_loss=0.113, pruned_loss=0.01972, audio_tagging_loss=0.005349, over 14016.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08881, pruned_loss=0.01196, audio_tagging_loss=0.008499, over 3039333.64 frames. ], batch size: 51, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:31:21,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3791826.6666666665, ans=0.125 2023-11-27 07:31:47,527 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-27 07:32:08,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3792093.3333333335, ans=0.2 2023-11-27 07:32:09,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3792093.3333333335, ans=0.2 2023-11-27 07:32:16,487 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3700, loss[loss=0.07318, simple_loss=0.09815, pruned_loss=0.01649, audio_tagging_loss=0.007625, over 14750.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08913, pruned_loss=0.01208, audio_tagging_loss=0.008539, over 3040543.93 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:32:16,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3792160.0, ans=0.2 2023-11-27 07:32:35,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3792226.6666666665, ans=0.04949747468305833 2023-11-27 07:32:35,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3792226.6666666665, ans=0.125 2023-11-27 07:32:37,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 9.068e+01 9.744e+01 1.049e+02 1.191e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:32:44,053 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-27 07:32:45,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3792293.3333333335, ans=0.0 2023-11-27 07:33:13,036 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3750, loss[loss=0.06016, simple_loss=0.08172, pruned_loss=0.01249, audio_tagging_loss=0.006812, over 16241.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08901, pruned_loss=0.01211, audio_tagging_loss=0.008505, over 3045044.68 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:33:28,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3792560.0, ans=0.0 2023-11-27 07:33:39,735 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-27 07:33:42,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3792626.6666666665, ans=0.125 2023-11-27 07:33:42,110 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:33:51,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-27 07:33:52,689 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:33:53,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3792693.3333333335, ans=0.125 2023-11-27 07:34:05,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3792760.0, ans=10.0 2023-11-27 07:34:05,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3792760.0, ans=0.0 2023-11-27 07:34:09,536 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3800, loss[loss=0.07872, simple_loss=0.1152, pruned_loss=0.01364, audio_tagging_loss=0.007492, over 15306.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08934, pruned_loss=0.01217, audio_tagging_loss=0.008488, over 3051521.14 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:34:19,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3792893.3333333335, ans=0.125 2023-11-27 07:34:28,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.318e+01 9.986e+01 1.074e+02 1.810e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-27 07:34:35,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-27 07:34:38,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3792960.0, ans=0.125 2023-11-27 07:34:40,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3792960.0, ans=0.1 2023-11-27 07:35:01,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2023-11-27 07:35:04,403 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3850, loss[loss=0.07128, simple_loss=0.09773, pruned_loss=0.01268, audio_tagging_loss=0.009731, over 14355.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08902, pruned_loss=0.01204, audio_tagging_loss=0.008595, over 3057170.44 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:35:04,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3793160.0, ans=0.07 2023-11-27 07:35:08,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3793160.0, ans=0.125 2023-11-27 07:35:15,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=22.5 2023-11-27 07:35:21,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-11-27 07:35:23,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3793226.6666666665, ans=0.0 2023-11-27 07:35:31,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-27 07:35:32,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-27 07:35:39,004 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:35:41,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3793360.0, ans=0.0 2023-11-27 07:35:42,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3793360.0, ans=0.125 2023-11-27 07:35:59,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2023-11-27 07:36:00,503 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3900, loss[loss=0.06632, simple_loss=0.09362, pruned_loss=0.01446, audio_tagging_loss=0.005046, over 14207.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08983, pruned_loss=0.0122, audio_tagging_loss=0.008553, over 3059206.26 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:36:22,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.249e+01 9.619e+01 1.030e+02 1.289e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 07:36:27,694 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-27 07:36:43,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3793693.3333333335, ans=0.0 2023-11-27 07:36:46,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-11-27 07:36:56,877 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3950, loss[loss=0.05576, simple_loss=0.07607, pruned_loss=0.009003, audio_tagging_loss=0.008725, over 16197.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08949, pruned_loss=0.01203, audio_tagging_loss=0.008715, over 3049831.15 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:37:13,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3793893.3333333335, ans=0.1 2023-11-27 07:37:23,054 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-27 07:37:32,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3794026.6666666665, ans=0.05 2023-11-27 07:37:36,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-27 07:37:52,207 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4000, loss[loss=0.04279, simple_loss=0.05245, pruned_loss=0.004893, audio_tagging_loss=0.01167, over 14788.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08925, pruned_loss=0.01192, audio_tagging_loss=0.008798, over 3053868.01 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:38:13,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.122e+01 1.001e+02 1.075e+02 1.362e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-27 07:38:20,088 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-27 07:38:24,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3794293.3333333335, ans=0.125 2023-11-27 07:38:39,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3794426.6666666665, ans=0.125 2023-11-27 07:38:48,286 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4050, loss[loss=0.07767, simple_loss=0.1147, pruned_loss=0.01152, audio_tagging_loss=0.008802, over 14625.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08956, pruned_loss=0.01189, audio_tagging_loss=0.008795, over 3059341.87 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:38:49,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3794493.3333333335, ans=0.2 2023-11-27 07:38:51,498 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:39:02,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3794560.0, ans=0.2 2023-11-27 07:39:05,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3794560.0, ans=0.0 2023-11-27 07:39:09,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2023-11-27 07:39:11,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-27 07:39:14,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3794626.6666666665, ans=0.1 2023-11-27 07:39:15,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-27 07:39:18,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3794626.6666666665, ans=0.95 2023-11-27 07:39:19,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3794626.6666666665, ans=0.0 2023-11-27 07:39:34,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2023-11-27 07:39:39,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3794760.0, ans=0.1 2023-11-27 07:39:44,867 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4100, loss[loss=0.05935, simple_loss=0.08217, pruned_loss=0.007465, audio_tagging_loss=0.01081, over 15567.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08997, pruned_loss=0.01186, audio_tagging_loss=0.008806, over 3057850.25 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:39:45,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3794826.6666666665, ans=0.125 2023-11-27 07:39:53,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3794826.6666666665, ans=0.2 2023-11-27 07:40:04,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3794893.3333333335, ans=0.025 2023-11-27 07:40:05,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 9.057e+01 9.659e+01 1.032e+02 2.111e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-27 07:40:11,105 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-27 07:40:11,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3794960.0, ans=0.0 2023-11-27 07:40:12,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=22.5 2023-11-27 07:40:27,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3795026.6666666665, ans=0.125 2023-11-27 07:40:29,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3795093.3333333335, ans=0.0 2023-11-27 07:40:40,540 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4150, loss[loss=0.0794, simple_loss=0.1096, pruned_loss=0.01605, audio_tagging_loss=0.00856, over 16018.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09038, pruned_loss=0.01207, audio_tagging_loss=0.008639, over 3052081.86 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:40:40,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3795160.0, ans=0.125 2023-11-27 07:40:46,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3795160.0, ans=0.1 2023-11-27 07:40:57,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3795226.6666666665, ans=0.125 2023-11-27 07:41:00,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3795226.6666666665, ans=0.2 2023-11-27 07:41:05,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-27 07:41:07,131 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-27 07:41:13,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3795360.0, ans=0.1 2023-11-27 07:41:22,167 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:41:26,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3795426.6666666665, ans=0.1 2023-11-27 07:41:29,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3795426.6666666665, ans=0.125 2023-11-27 07:41:35,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2023-11-27 07:41:36,056 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4200, loss[loss=0.0797, simple_loss=0.1181, pruned_loss=0.01572, audio_tagging_loss=0.004922, over 15540.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09095, pruned_loss=0.01213, audio_tagging_loss=0.00859, over 3055958.11 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:41:41,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2023-11-27 07:41:45,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2023-11-27 07:41:51,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795560.0, ans=0.1 2023-11-27 07:41:58,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 9.145e+01 9.853e+01 1.047e+02 1.662e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 07:42:01,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-27 07:42:01,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3795626.6666666665, ans=0.0 2023-11-27 07:42:03,836 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-27 07:42:04,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3795626.6666666665, ans=0.04949747468305833 2023-11-27 07:42:06,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2023-11-27 07:42:15,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3795693.3333333335, ans=0.125 2023-11-27 07:42:21,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-11-27 07:42:30,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3795760.0, ans=0.0 2023-11-27 07:42:32,524 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4250, loss[loss=0.05842, simple_loss=0.08084, pruned_loss=0.008018, audio_tagging_loss=0.009983, over 14499.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09046, pruned_loss=0.01211, audio_tagging_loss=0.008528, over 3052496.73 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:42:46,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3795893.3333333335, ans=0.125 2023-11-27 07:42:49,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3795893.3333333335, ans=0.2 2023-11-27 07:42:50,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3795893.3333333335, ans=0.125 2023-11-27 07:42:59,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-27 07:43:29,342 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4300, loss[loss=0.06358, simple_loss=0.09, pruned_loss=0.01051, audio_tagging_loss=0.008076, over 16205.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08992, pruned_loss=0.01208, audio_tagging_loss=0.008485, over 3044410.28 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:43:36,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3796160.0, ans=0.2 2023-11-27 07:43:38,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3796160.0, ans=0.0 2023-11-27 07:43:41,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3796226.6666666665, ans=0.1 2023-11-27 07:43:50,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.134e+01 9.847e+01 1.057e+02 1.328e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 07:43:51,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3796293.3333333335, ans=0.125 2023-11-27 07:43:53,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3796293.3333333335, ans=0.125 2023-11-27 07:43:56,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-27 07:44:11,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3796360.0, ans=0.0 2023-11-27 07:44:11,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-27 07:44:24,884 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4350, loss[loss=0.06358, simple_loss=0.09467, pruned_loss=0.009446, audio_tagging_loss=0.006797, over 14396.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0901, pruned_loss=0.01205, audio_tagging_loss=0.008367, over 3047701.81 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:44:49,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3796626.6666666665, ans=0.0 2023-11-27 07:44:50,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3796626.6666666665, ans=0.0 2023-11-27 07:44:52,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-27 07:44:57,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3796626.6666666665, ans=0.0 2023-11-27 07:44:57,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3796626.6666666665, ans=0.125 2023-11-27 07:45:06,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-11-27 07:45:06,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3796693.3333333335, ans=0.05 2023-11-27 07:45:12,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3796760.0, ans=0.5 2023-11-27 07:45:15,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3796760.0, ans=0.125 2023-11-27 07:45:17,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3796760.0, ans=0.025 2023-11-27 07:45:19,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796760.0, ans=0.1 2023-11-27 07:45:20,910 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4400, loss[loss=0.05717, simple_loss=0.07672, pruned_loss=0.009782, audio_tagging_loss=0.009028, over 15629.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09048, pruned_loss=0.01217, audio_tagging_loss=0.008382, over 3041763.46 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:45:26,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3796826.6666666665, ans=0.2 2023-11-27 07:45:42,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.963e+01 9.534e+01 1.011e+02 1.280e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 07:45:47,992 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-27 07:45:53,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797026.6666666665, ans=0.1 2023-11-27 07:45:54,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3797026.6666666665, ans=0.125 2023-11-27 07:46:09,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.60 vs. limit=22.5 2023-11-27 07:46:16,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3797160.0, ans=0.125 2023-11-27 07:46:17,080 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4450, loss[loss=0.05845, simple_loss=0.09194, pruned_loss=0.006349, audio_tagging_loss=0.006136, over 16755.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08991, pruned_loss=0.01215, audio_tagging_loss=0.008403, over 3049106.79 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:46:43,694 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-27 07:47:12,993 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4500, loss[loss=0.05601, simple_loss=0.07486, pruned_loss=0.01029, audio_tagging_loss=0.008287, over 14080.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08937, pruned_loss=0.01209, audio_tagging_loss=0.008401, over 3043427.12 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:47:35,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 9.073e+01 9.744e+01 1.047e+02 1.558e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:47:35,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3797626.6666666665, ans=0.0 2023-11-27 07:47:40,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-27 07:47:43,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3797626.6666666665, ans=0.125 2023-11-27 07:47:45,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=12.0 2023-11-27 07:47:50,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2023-11-27 07:48:08,488 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4550, loss[loss=0.04057, simple_loss=0.04815, pruned_loss=0.0066, audio_tagging_loss=0.009891, over 14284.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0891, pruned_loss=0.01205, audio_tagging_loss=0.008395, over 3039241.70 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:48:19,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3797893.3333333335, ans=0.2 2023-11-27 07:48:19,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797893.3333333335, ans=0.1 2023-11-27 07:48:35,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-27 07:48:39,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3797960.0, ans=0.0 2023-11-27 07:48:49,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3798026.6666666665, ans=0.125 2023-11-27 07:48:52,297 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:49:05,198 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4600, loss[loss=0.06192, simple_loss=0.07653, pruned_loss=0.01246, audio_tagging_loss=0.01119, over 13602.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08813, pruned_loss=0.01183, audio_tagging_loss=0.008503, over 3038633.86 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:49:05,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3798160.0, ans=0.1 2023-11-27 07:49:09,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3798160.0, ans=0.125 2023-11-27 07:49:27,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.803e+01 9.390e+01 1.011e+02 1.144e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 07:49:32,002 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-27 07:50:00,871 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4650, loss[loss=0.06919, simple_loss=0.09371, pruned_loss=0.01285, audio_tagging_loss=0.009489, over 16321.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08782, pruned_loss=0.01181, audio_tagging_loss=0.008522, over 3037390.78 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:50:25,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3798626.6666666665, ans=0.125 2023-11-27 07:50:28,634 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-27 07:50:57,442 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4700, loss[loss=0.06085, simple_loss=0.07287, pruned_loss=0.01162, audio_tagging_loss=0.01279, over 14430.00 frames. ], tot_loss[loss=0.06383, simple_loss=0.08708, pruned_loss=0.01165, audio_tagging_loss=0.008644, over 3031680.87 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:50:59,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3798826.6666666665, ans=0.125 2023-11-27 07:51:01,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3798826.6666666665, ans=0.125 2023-11-27 07:51:13,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3798893.3333333335, ans=0.2 2023-11-27 07:51:19,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.828e+01 9.714e+01 1.039e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:51:23,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-27 07:51:53,640 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4750, loss[loss=0.05693, simple_loss=0.07667, pruned_loss=0.009859, audio_tagging_loss=0.008734, over 15260.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08786, pruned_loss=0.01177, audio_tagging_loss=0.008666, over 3033594.03 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:52:02,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3799160.0, ans=0.1 2023-11-27 07:52:04,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3799226.6666666665, ans=0.0 2023-11-27 07:52:06,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3799226.6666666665, ans=0.125 2023-11-27 07:52:14,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3799293.3333333335, ans=0.5 2023-11-27 07:52:20,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-27 07:52:24,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3799293.3333333335, ans=0.1 2023-11-27 07:52:31,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2023-11-27 07:52:33,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3799360.0, ans=0.2 2023-11-27 07:52:39,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3799426.6666666665, ans=0.2 2023-11-27 07:52:40,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3799426.6666666665, ans=0.2 2023-11-27 07:52:48,851 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4800, loss[loss=0.07292, simple_loss=0.09704, pruned_loss=0.01494, audio_tagging_loss=0.009462, over 15370.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08835, pruned_loss=0.01194, audio_tagging_loss=0.008765, over 3040308.98 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:52:51,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3799493.3333333335, ans=0.125 2023-11-27 07:53:11,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 9.043e+01 9.728e+01 1.036e+02 1.523e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 07:53:11,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3799626.6666666665, ans=0.125 2023-11-27 07:53:13,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3799626.6666666665, ans=0.0 2023-11-27 07:53:16,495 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-27 07:53:45,023 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4850, loss[loss=0.07109, simple_loss=0.09744, pruned_loss=0.01502, audio_tagging_loss=0.007353, over 15977.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08849, pruned_loss=0.01186, audio_tagging_loss=0.008876, over 3037257.70 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:53:53,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3799826.6666666665, ans=0.0 2023-11-27 07:53:54,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3799826.6666666665, ans=0.09899494936611666 2023-11-27 07:54:09,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3799960.0, ans=0.125 2023-11-27 07:54:11,677 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-27 07:54:21,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3800026.6666666665, ans=0.125 2023-11-27 07:54:31,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-27 07:54:41,097 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4900, loss[loss=0.04688, simple_loss=0.06149, pruned_loss=0.006725, audio_tagging_loss=0.009408, over 15984.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08873, pruned_loss=0.01199, audio_tagging_loss=0.008933, over 3046153.26 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:54:44,574 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:54:49,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3800160.0, ans=0.02 2023-11-27 07:55:01,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3800293.3333333335, ans=0.0 2023-11-27 07:55:02,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.110e+01 9.715e+01 1.029e+02 1.331e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:55:03,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3800293.3333333335, ans=0.0 2023-11-27 07:55:06,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-27 07:55:13,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3800360.0, ans=0.1 2023-11-27 07:55:23,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3800360.0, ans=0.1 2023-11-27 07:55:36,375 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4950, loss[loss=0.0501, simple_loss=0.06397, pruned_loss=0.00753, audio_tagging_loss=0.01058, over 14247.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08855, pruned_loss=0.01214, audio_tagging_loss=0.008795, over 3036075.66 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:55:42,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3800493.3333333335, ans=0.125 2023-11-27 07:55:45,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3800493.3333333335, ans=0.0 2023-11-27 07:56:04,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-27 07:56:10,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3800693.3333333335, ans=0.025 2023-11-27 07:56:16,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3800693.3333333335, ans=0.125 2023-11-27 07:56:27,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3800760.0, ans=0.1 2023-11-27 07:56:31,911 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5000, loss[loss=0.04832, simple_loss=0.06737, pruned_loss=0.008569, audio_tagging_loss=0.006061, over 15265.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.0887, pruned_loss=0.01208, audio_tagging_loss=0.008588, over 3037660.79 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:56:34,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3800826.6666666665, ans=0.125 2023-11-27 07:56:35,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-27 07:56:54,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3800960.0, ans=0.125 2023-11-27 07:56:55,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.884e+01 9.444e+01 1.042e+02 1.203e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 07:56:59,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-27 07:57:12,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3801026.6666666665, ans=0.125 2023-11-27 07:57:24,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3801093.3333333335, ans=10.0 2023-11-27 07:57:24,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=22.5 2023-11-27 07:57:25,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3801093.3333333335, ans=0.0 2023-11-27 07:57:29,492 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5050, loss[loss=0.05634, simple_loss=0.07446, pruned_loss=0.008648, audio_tagging_loss=0.01046, over 14728.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08963, pruned_loss=0.01206, audio_tagging_loss=0.008502, over 3035639.97 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:57:29,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3801160.0, ans=0.0 2023-11-27 07:57:37,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-27 07:57:50,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3801293.3333333335, ans=0.2 2023-11-27 07:57:54,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-27 07:57:54,978 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-27 07:58:00,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-27 07:58:01,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3801360.0, ans=0.0 2023-11-27 07:58:07,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3801360.0, ans=0.125 2023-11-27 07:58:07,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3801360.0, ans=0.125 2023-11-27 07:58:14,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801426.6666666665, ans=0.1 2023-11-27 07:58:24,947 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5100, loss[loss=0.07791, simple_loss=0.1113, pruned_loss=0.01695, audio_tagging_loss=0.005312, over 14314.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08846, pruned_loss=0.01173, audio_tagging_loss=0.008484, over 3032509.04 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:58:35,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3801560.0, ans=0.2 2023-11-27 07:58:46,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.907e+01 8.861e+01 9.486e+01 1.041e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 07:58:51,772 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-27 07:59:00,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3801693.3333333335, ans=0.125 2023-11-27 07:59:06,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801693.3333333335, ans=0.125 2023-11-27 07:59:07,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.04 vs. limit=5.0 2023-11-27 07:59:10,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3801760.0, ans=0.0 2023-11-27 07:59:15,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3801760.0, ans=0.125 2023-11-27 07:59:18,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3801826.6666666665, ans=0.0 2023-11-27 07:59:19,741 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5150, loss[loss=0.06423, simple_loss=0.08479, pruned_loss=0.01122, audio_tagging_loss=0.01062, over 16207.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08907, pruned_loss=0.01194, audio_tagging_loss=0.008389, over 3036178.67 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:59:22,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-27 07:59:24,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-27 07:59:28,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3801826.6666666665, ans=0.2 2023-11-27 07:59:31,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3801893.3333333335, ans=10.0 2023-11-27 07:59:38,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3801893.3333333335, ans=0.125 2023-11-27 07:59:46,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-27 07:59:57,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3802026.6666666665, ans=0.125 2023-11-27 08:00:15,510 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5200, loss[loss=0.09574, simple_loss=0.1381, pruned_loss=0.02088, audio_tagging_loss=0.005827, over 15012.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08981, pruned_loss=0.012, audio_tagging_loss=0.008345, over 3039215.55 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 08:00:33,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3802226.6666666665, ans=0.125 2023-11-27 08:00:39,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.152e+01 9.640e+01 1.026e+02 1.239e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:00:42,058 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-27 08:00:50,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3802360.0, ans=0.125 2023-11-27 08:01:11,804 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5250, loss[loss=0.0697, simple_loss=0.09265, pruned_loss=0.01654, audio_tagging_loss=0.006836, over 14991.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08838, pruned_loss=0.01188, audio_tagging_loss=0.00842, over 3038870.36 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:01:20,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3802493.3333333335, ans=0.125 2023-11-27 08:01:22,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2023-11-27 08:01:38,158 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-27 08:01:55,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3802693.3333333335, ans=0.125 2023-11-27 08:02:07,493 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5300, loss[loss=0.07046, simple_loss=0.09707, pruned_loss=0.01385, audio_tagging_loss=0.008075, over 15411.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08849, pruned_loss=0.01188, audio_tagging_loss=0.008526, over 3035637.90 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:02:09,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-27 08:02:14,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2023-11-27 08:02:18,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3802893.3333333335, ans=0.0 2023-11-27 08:02:23,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3802893.3333333335, ans=0.2 2023-11-27 08:02:31,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3802960.0, ans=0.2 2023-11-27 08:02:33,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 9.130e+01 9.779e+01 1.044e+02 2.518e+02, threshold=1.956e+02, percent-clipped=1.0 2023-11-27 08:02:35,408 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-27 08:02:46,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3803026.6666666665, ans=0.0 2023-11-27 08:02:50,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3803026.6666666665, ans=0.125 2023-11-27 08:03:03,256 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5350, loss[loss=0.06235, simple_loss=0.08925, pruned_loss=0.008904, audio_tagging_loss=0.008821, over 14841.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08972, pruned_loss=0.01209, audio_tagging_loss=0.008491, over 3041164.34 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:03:27,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-27 08:03:29,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3803293.3333333335, ans=0.125 2023-11-27 08:03:30,622 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-27 08:03:43,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3803360.0, ans=0.0 2023-11-27 08:03:56,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3803426.6666666665, ans=0.1 2023-11-27 08:03:59,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3803493.3333333335, ans=0.2 2023-11-27 08:04:00,238 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5400, loss[loss=0.07928, simple_loss=0.1198, pruned_loss=0.01562, audio_tagging_loss=0.003759, over 15712.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08963, pruned_loss=0.01196, audio_tagging_loss=0.008549, over 3044595.05 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:04:25,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.928e+01 9.462e+01 1.035e+02 1.260e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 08:04:26,235 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-27 08:04:44,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=12.0 2023-11-27 08:04:55,117 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5450, loss[loss=0.06934, simple_loss=0.1053, pruned_loss=0.009241, audio_tagging_loss=0.007445, over 16276.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08953, pruned_loss=0.01195, audio_tagging_loss=0.008575, over 3053333.10 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:05:22,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-27 08:05:29,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2023-11-27 08:05:46,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3804093.3333333335, ans=0.07 2023-11-27 08:05:50,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3804160.0, ans=0.125 2023-11-27 08:05:51,032 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5500, loss[loss=0.06402, simple_loss=0.07612, pruned_loss=0.01515, audio_tagging_loss=0.01081, over 15751.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08928, pruned_loss=0.01196, audio_tagging_loss=0.008664, over 3058310.61 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:05:58,792 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:06:16,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.180e+01 9.726e+01 1.043e+02 1.311e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:06:18,117 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-27 08:06:43,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3804426.6666666665, ans=0.125 2023-11-27 08:06:47,590 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5550, loss[loss=0.06653, simple_loss=0.08169, pruned_loss=0.0175, audio_tagging_loss=0.008187, over 14380.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08957, pruned_loss=0.01199, audio_tagging_loss=0.008723, over 3062380.25 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:07:03,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3804560.0, ans=0.125 2023-11-27 08:07:06,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2023-11-27 08:07:14,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-27 08:07:43,567 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5600, loss[loss=0.07696, simple_loss=0.1169, pruned_loss=0.01215, audio_tagging_loss=0.006333, over 16038.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08962, pruned_loss=0.01187, audio_tagging_loss=0.008722, over 3053205.26 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:07:46,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3804826.6666666665, ans=0.0 2023-11-27 08:07:46,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3804826.6666666665, ans=0.2 2023-11-27 08:07:55,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3804893.3333333335, ans=0.0 2023-11-27 08:08:10,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.878e+01 8.987e+01 9.756e+01 1.044e+02 1.605e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 08:08:10,627 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-27 08:08:23,707 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:08:37,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-27 08:08:38,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3805160.0, ans=0.0 2023-11-27 08:08:39,235 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5650, loss[loss=0.06024, simple_loss=0.08102, pruned_loss=0.01133, audio_tagging_loss=0.008411, over 14024.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08799, pruned_loss=0.01168, audio_tagging_loss=0.008867, over 3052452.43 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:08:40,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3805160.0, ans=0.125 2023-11-27 08:08:41,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3805160.0, ans=0.0 2023-11-27 08:08:41,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3805160.0, ans=0.125 2023-11-27 08:08:49,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3805160.0, ans=0.2 2023-11-27 08:09:04,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-27 08:09:06,198 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-27 08:09:07,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3805293.3333333335, ans=0.1 2023-11-27 08:09:35,534 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5700, loss[loss=0.07629, simple_loss=0.1096, pruned_loss=0.01398, audio_tagging_loss=0.00749, over 16141.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08879, pruned_loss=0.01184, audio_tagging_loss=0.008819, over 3058892.50 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:09:38,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2023-11-27 08:09:42,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3805493.3333333335, ans=0.125 2023-11-27 08:09:57,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3805626.6666666665, ans=0.1 2023-11-27 08:09:58,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=22.5 2023-11-27 08:10:01,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.888e+01 9.534e+01 1.037e+02 1.369e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 08:10:01,857 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-27 08:10:05,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3805626.6666666665, ans=0.0 2023-11-27 08:10:09,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3805693.3333333335, ans=0.125 2023-11-27 08:10:30,874 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5750, loss[loss=0.0565, simple_loss=0.07061, pruned_loss=0.008587, audio_tagging_loss=0.01261, over 14644.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08784, pruned_loss=0.0117, audio_tagging_loss=0.008747, over 3054747.30 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:10:58,282 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-27 08:11:01,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3805960.0, ans=0.1 2023-11-27 08:11:08,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=22.5 2023-11-27 08:11:09,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.19 vs. limit=10.0 2023-11-27 08:11:17,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2023-11-27 08:11:27,062 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5800, loss[loss=0.04177, simple_loss=0.05017, pruned_loss=0.006254, audio_tagging_loss=0.01043, over 16248.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08816, pruned_loss=0.01185, audio_tagging_loss=0.008633, over 3051733.12 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:11:33,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806160.0, ans=0.1 2023-11-27 08:11:37,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3806226.6666666665, ans=0.0 2023-11-27 08:11:45,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3806226.6666666665, ans=0.0 2023-11-27 08:11:53,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.206e+01 9.616e+01 1.021e+02 1.551e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:11:53,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-27 08:11:59,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3806360.0, ans=0.05 2023-11-27 08:12:05,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-11-27 08:12:07,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2023-11-27 08:12:08,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3806360.0, ans=0.0 2023-11-27 08:12:22,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3806493.3333333335, ans=0.125 2023-11-27 08:12:23,486 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5850, loss[loss=0.055, simple_loss=0.0831, pruned_loss=0.005589, audio_tagging_loss=0.00786, over 14526.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08855, pruned_loss=0.01185, audio_tagging_loss=0.008609, over 3045175.54 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:12:26,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3806493.3333333335, ans=0.0 2023-11-27 08:12:26,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3806493.3333333335, ans=0.125 2023-11-27 08:12:29,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806493.3333333335, ans=0.1 2023-11-27 08:12:45,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3806626.6666666665, ans=0.125 2023-11-27 08:12:49,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3806626.6666666665, ans=0.0 2023-11-27 08:12:49,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806626.6666666665, ans=0.1 2023-11-27 08:12:49,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-27 08:13:02,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3806693.3333333335, ans=0.125 2023-11-27 08:13:14,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2023-11-27 08:13:18,640 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5900, loss[loss=0.05996, simple_loss=0.08543, pruned_loss=0.009675, audio_tagging_loss=0.007574, over 14959.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08835, pruned_loss=0.01173, audio_tagging_loss=0.008585, over 3039641.99 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:13:41,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3806960.0, ans=0.125 2023-11-27 08:13:45,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.203e+01 9.720e+01 1.067e+02 1.821e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 08:13:45,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-27 08:14:14,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=15.0 2023-11-27 08:14:14,865 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5950, loss[loss=0.04272, simple_loss=0.04788, pruned_loss=0.004377, audio_tagging_loss=0.0144, over 14579.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08805, pruned_loss=0.01176, audio_tagging_loss=0.008674, over 3041216.76 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:14:24,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3807226.6666666665, ans=0.2 2023-11-27 08:14:39,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=22.5 2023-11-27 08:14:41,371 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-27 08:15:08,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3807426.6666666665, ans=0.125 2023-11-27 08:15:10,286 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6000, loss[loss=0.05408, simple_loss=0.06693, pruned_loss=0.009915, audio_tagging_loss=0.0107, over 16258.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08739, pruned_loss=0.01175, audio_tagging_loss=0.00867, over 3034840.53 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:15:10,286 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 08:15:40,487 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4981, 2.9958, 3.2425, 2.9756, 3.6068, 3.7481, 3.2655, 3.2116], device='cuda:1') 2023-11-27 08:15:42,624 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05815, simple_loss=0.05046, pruned_loss=0.005371, audio_tagging_loss=0.02755, over 4681554.00 frames. 2023-11-27 08:15:42,624 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 08:15:58,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3807560.0, ans=0.95 2023-11-27 08:16:04,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2023-11-27 08:16:10,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.889e+01 9.644e+01 1.039e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 08:16:10,331 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-27 08:16:10,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3807626.6666666665, ans=0.125 2023-11-27 08:16:20,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3807693.3333333335, ans=0.0 2023-11-27 08:16:24,050 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:16:26,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3807760.0, ans=0.0 2023-11-27 08:16:34,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3807760.0, ans=10.0 2023-11-27 08:16:39,039 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6050, loss[loss=0.06394, simple_loss=0.09049, pruned_loss=0.009805, audio_tagging_loss=0.008893, over 15564.00 frames. ], tot_loss[loss=0.06392, simple_loss=0.08743, pruned_loss=0.01158, audio_tagging_loss=0.008632, over 3035702.50 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:16:44,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3807826.6666666665, ans=0.125 2023-11-27 08:16:51,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3807893.3333333335, ans=0.125 2023-11-27 08:17:05,688 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-27 08:17:13,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3808026.6666666665, ans=0.0 2023-11-27 08:17:22,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2023-11-27 08:17:35,806 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6100, loss[loss=0.08534, simple_loss=0.1213, pruned_loss=0.01685, audio_tagging_loss=0.007863, over 15825.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08795, pruned_loss=0.01155, audio_tagging_loss=0.008637, over 3038631.06 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:17:36,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3808160.0, ans=0.125 2023-11-27 08:17:36,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3808160.0, ans=0.0 2023-11-27 08:17:38,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3808160.0, ans=0.125 2023-11-27 08:17:44,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3808160.0, ans=0.2 2023-11-27 08:17:46,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-27 08:17:54,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-27 08:18:01,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.030e+01 9.632e+01 1.027e+02 1.334e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 08:18:01,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-27 08:18:04,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3808293.3333333335, ans=0.125 2023-11-27 08:18:27,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3808426.6666666665, ans=0.125 2023-11-27 08:18:29,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3808426.6666666665, ans=0.125 2023-11-27 08:18:31,166 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6150, loss[loss=0.0725, simple_loss=0.1027, pruned_loss=0.01334, audio_tagging_loss=0.007814, over 14933.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08881, pruned_loss=0.0118, audio_tagging_loss=0.008563, over 3039136.79 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:18:57,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3808626.6666666665, ans=0.1 2023-11-27 08:18:58,670 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-27 08:18:59,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3808626.6666666665, ans=0.1 2023-11-27 08:19:09,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3808693.3333333335, ans=0.125 2023-11-27 08:19:18,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3808760.0, ans=0.0 2023-11-27 08:19:26,893 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6200, loss[loss=0.0608, simple_loss=0.08109, pruned_loss=0.01222, audio_tagging_loss=0.008039, over 14617.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08809, pruned_loss=0.01173, audio_tagging_loss=0.008723, over 3037984.30 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:19:36,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3808826.6666666665, ans=0.125 2023-11-27 08:19:37,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3808893.3333333335, ans=0.125 2023-11-27 08:19:46,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3808893.3333333335, ans=0.2 2023-11-27 08:19:46,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3808893.3333333335, ans=0.0 2023-11-27 08:19:53,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.915e+01 9.429e+01 1.009e+02 1.347e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 08:19:53,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-27 08:20:23,684 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6250, loss[loss=0.0825, simple_loss=0.1177, pruned_loss=0.01693, audio_tagging_loss=0.006696, over 15605.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08789, pruned_loss=0.01162, audio_tagging_loss=0.008778, over 3040555.01 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:20:31,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3809160.0, ans=0.0 2023-11-27 08:20:32,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3809160.0, ans=0.2 2023-11-27 08:20:45,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2023-11-27 08:20:49,673 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-27 08:21:18,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3809493.3333333335, ans=0.125 2023-11-27 08:21:19,191 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6300, loss[loss=0.05293, simple_loss=0.06708, pruned_loss=0.006775, audio_tagging_loss=0.01261, over 14480.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08847, pruned_loss=0.01167, audio_tagging_loss=0.008857, over 3042663.27 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:21:22,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2023-11-27 08:21:34,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-11-27 08:21:41,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3809626.6666666665, ans=0.2 2023-11-27 08:21:46,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.820e+01 9.366e+01 1.014e+02 1.355e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:21:46,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-27 08:22:15,170 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6350, loss[loss=0.08788, simple_loss=0.1264, pruned_loss=0.01662, audio_tagging_loss=0.008031, over 15587.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0885, pruned_loss=0.01168, audio_tagging_loss=0.008876, over 3044344.97 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:22:41,861 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-27 08:22:58,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-27 08:22:59,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3810093.3333333335, ans=0.0 2023-11-27 08:23:00,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3810093.3333333335, ans=0.1 2023-11-27 08:23:01,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-27 08:23:04,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3810093.3333333335, ans=0.0 2023-11-27 08:23:08,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2023-11-27 08:23:11,259 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6400, loss[loss=0.05039, simple_loss=0.05897, pruned_loss=0.009594, audio_tagging_loss=0.01131, over 14682.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08842, pruned_loss=0.01182, audio_tagging_loss=0.00893, over 3037836.79 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:23:13,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3810160.0, ans=0.125 2023-11-27 08:23:37,599 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-27 08:23:39,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.870e+01 9.357e+01 1.034e+02 1.188e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 08:23:40,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3810293.3333333335, ans=0.0 2023-11-27 08:23:42,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3810293.3333333335, ans=0.125 2023-11-27 08:23:46,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3810360.0, ans=0.2 2023-11-27 08:24:00,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3810426.6666666665, ans=0.05 2023-11-27 08:24:01,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3810426.6666666665, ans=0.125 2023-11-27 08:24:07,248 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6450, loss[loss=0.06152, simple_loss=0.08182, pruned_loss=0.01134, audio_tagging_loss=0.009276, over 15914.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08859, pruned_loss=0.01169, audio_tagging_loss=0.008903, over 3037076.26 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:24:28,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3810626.6666666665, ans=0.125 2023-11-27 08:24:33,665 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-27 08:24:45,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.32 vs. limit=10.0 2023-11-27 08:25:02,687 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6500, loss[loss=0.06054, simple_loss=0.08903, pruned_loss=0.008866, audio_tagging_loss=0.007163, over 16280.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08911, pruned_loss=0.01175, audio_tagging_loss=0.008784, over 3031905.42 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:25:03,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-27 08:25:08,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3810826.6666666665, ans=0.125 2023-11-27 08:25:08,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3810826.6666666665, ans=0.125 2023-11-27 08:25:15,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3810893.3333333335, ans=0.07 2023-11-27 08:25:18,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-11-27 08:25:20,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3810893.3333333335, ans=0.125 2023-11-27 08:25:30,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-27 08:25:31,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.956e+01 9.682e+01 1.036e+02 1.299e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 08:25:32,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3810960.0, ans=0.1 2023-11-27 08:25:44,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3811026.6666666665, ans=0.0 2023-11-27 08:25:53,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3811093.3333333335, ans=0.0 2023-11-27 08:25:58,447 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6550, loss[loss=0.05308, simple_loss=0.06584, pruned_loss=0.01123, audio_tagging_loss=0.008927, over 14706.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08896, pruned_loss=0.0117, audio_tagging_loss=0.008619, over 3033275.75 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:26:04,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2023-11-27 08:26:06,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3811160.0, ans=0.0 2023-11-27 08:26:08,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3811160.0, ans=0.125 2023-11-27 08:26:13,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2023-11-27 08:26:16,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-27 08:26:19,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3811226.6666666665, ans=0.2 2023-11-27 08:26:25,736 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-27 08:26:25,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3811293.3333333335, ans=0.2 2023-11-27 08:26:28,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-27 08:26:33,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-27 08:26:42,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3811426.6666666665, ans=0.125 2023-11-27 08:26:42,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3811426.6666666665, ans=0.2 2023-11-27 08:26:49,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3811426.6666666665, ans=0.125 2023-11-27 08:26:55,412 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6600, loss[loss=0.0716, simple_loss=0.09028, pruned_loss=0.01564, audio_tagging_loss=0.01082, over 14415.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08888, pruned_loss=0.01175, audio_tagging_loss=0.008516, over 3033627.85 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:26:59,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2023-11-27 08:27:07,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3811560.0, ans=0.2 2023-11-27 08:27:18,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3811626.6666666665, ans=0.2 2023-11-27 08:27:20,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-11-27 08:27:21,300 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-27 08:27:23,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.078e+01 9.642e+01 1.016e+02 1.265e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:27:29,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3811693.3333333335, ans=0.2 2023-11-27 08:27:37,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3811693.3333333335, ans=0.125 2023-11-27 08:27:40,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3811760.0, ans=0.125 2023-11-27 08:27:45,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3811760.0, ans=0.025 2023-11-27 08:27:45,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-27 08:27:47,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3811760.0, ans=0.0 2023-11-27 08:27:50,315 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6650, loss[loss=0.05707, simple_loss=0.07907, pruned_loss=0.008621, audio_tagging_loss=0.008915, over 15562.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08878, pruned_loss=0.0117, audio_tagging_loss=0.008461, over 3037514.13 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:27:52,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3811826.6666666665, ans=0.125 2023-11-27 08:27:56,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3811826.6666666665, ans=0.125 2023-11-27 08:28:00,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3811893.3333333335, ans=0.0 2023-11-27 08:28:13,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.06 vs. limit=10.0 2023-11-27 08:28:18,114 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-27 08:28:20,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3811960.0, ans=0.125 2023-11-27 08:28:22,707 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:28:24,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3812026.6666666665, ans=0.0 2023-11-27 08:28:27,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3812026.6666666665, ans=0.2 2023-11-27 08:28:28,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3812026.6666666665, ans=0.0 2023-11-27 08:28:29,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3812026.6666666665, ans=0.0 2023-11-27 08:28:33,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3812026.6666666665, ans=0.0 2023-11-27 08:28:46,281 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6700, loss[loss=0.0653, simple_loss=0.08782, pruned_loss=0.01162, audio_tagging_loss=0.009767, over 14537.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08929, pruned_loss=0.0118, audio_tagging_loss=0.008398, over 3043117.13 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:28:48,625 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:29:02,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3812226.6666666665, ans=0.125 2023-11-27 08:29:13,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-27 08:29:15,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.098e+01 9.634e+01 1.039e+02 1.370e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 08:29:19,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2023-11-27 08:29:27,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3812360.0, ans=0.2 2023-11-27 08:29:39,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3812426.6666666665, ans=0.0 2023-11-27 08:29:42,715 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6750, loss[loss=0.06126, simple_loss=0.09001, pruned_loss=0.009659, audio_tagging_loss=0.006595, over 14790.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08876, pruned_loss=0.0118, audio_tagging_loss=0.008443, over 3036893.22 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:29:48,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3812493.3333333335, ans=0.125 2023-11-27 08:29:49,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3812493.3333333335, ans=0.0 2023-11-27 08:29:56,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2023-11-27 08:29:59,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3812560.0, ans=0.5 2023-11-27 08:30:09,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-27 08:30:24,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3812693.3333333335, ans=0.1 2023-11-27 08:30:25,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-27 08:30:38,286 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6800, loss[loss=0.07003, simple_loss=0.09712, pruned_loss=0.01384, audio_tagging_loss=0.007636, over 13944.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.0893, pruned_loss=0.01194, audio_tagging_loss=0.008381, over 3031432.23 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:31:05,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=8.0 2023-11-27 08:31:05,394 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-27 08:31:07,402 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.218e+01 9.743e+01 1.054e+02 1.281e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 08:31:11,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3813026.6666666665, ans=15.0 2023-11-27 08:31:23,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3813093.3333333335, ans=0.125 2023-11-27 08:31:33,854 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6850, loss[loss=0.06029, simple_loss=0.08095, pruned_loss=0.01004, audio_tagging_loss=0.009775, over 14401.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0891, pruned_loss=0.01183, audio_tagging_loss=0.008415, over 3035897.10 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:32:01,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-27 08:32:06,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3813293.3333333335, ans=0.05 2023-11-27 08:32:21,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3813426.6666666665, ans=0.125 2023-11-27 08:32:32,564 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6900, loss[loss=0.07848, simple_loss=0.1056, pruned_loss=0.01989, audio_tagging_loss=0.00577, over 14260.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08977, pruned_loss=0.01206, audio_tagging_loss=0.008403, over 3038050.02 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:32:37,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3813493.3333333335, ans=0.2 2023-11-27 08:32:52,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-27 08:32:58,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3813626.6666666665, ans=0.0 2023-11-27 08:32:58,994 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-27 08:32:59,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3813626.6666666665, ans=0.0 2023-11-27 08:33:01,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.797e+01 9.367e+01 1.009e+02 1.933e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:33:15,880 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:33:17,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3813760.0, ans=0.0 2023-11-27 08:33:22,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3813760.0, ans=15.0 2023-11-27 08:33:28,055 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6950, loss[loss=0.06918, simple_loss=0.1001, pruned_loss=0.01271, audio_tagging_loss=0.006399, over 16043.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08999, pruned_loss=0.01206, audio_tagging_loss=0.008458, over 3037668.28 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:33:31,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3813826.6666666665, ans=0.125 2023-11-27 08:33:53,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-27 08:33:55,160 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-27 08:34:20,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-27 08:34:21,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-27 08:34:23,579 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7000, loss[loss=0.05003, simple_loss=0.07417, pruned_loss=0.005565, audio_tagging_loss=0.007383, over 15307.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08946, pruned_loss=0.01195, audio_tagging_loss=0.008501, over 3033774.17 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:34:24,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-27 08:34:43,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3814226.6666666665, ans=0.125 2023-11-27 08:34:50,157 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-27 08:34:52,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.213e+01 9.596e+01 1.029e+02 1.427e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 08:35:00,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3814360.0, ans=0.125 2023-11-27 08:35:19,307 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7050, loss[loss=0.07005, simple_loss=0.103, pruned_loss=0.01157, audio_tagging_loss=0.006991, over 15997.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08982, pruned_loss=0.01215, audio_tagging_loss=0.008507, over 3034563.71 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:35:34,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3814560.0, ans=0.0 2023-11-27 08:35:45,987 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-27 08:36:04,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3814760.0, ans=0.125 2023-11-27 08:36:07,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3814760.0, ans=0.125 2023-11-27 08:36:14,651 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7100, loss[loss=0.05743, simple_loss=0.07332, pruned_loss=0.01208, audio_tagging_loss=0.008685, over 14159.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08912, pruned_loss=0.01196, audio_tagging_loss=0.008568, over 3031412.26 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:36:21,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3814826.6666666665, ans=0.1 2023-11-27 08:36:21,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3814826.6666666665, ans=0.125 2023-11-27 08:36:27,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3814893.3333333335, ans=0.1 2023-11-27 08:36:34,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3814893.3333333335, ans=0.125 2023-11-27 08:36:36,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3814960.0, ans=0.125 2023-11-27 08:36:42,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-27 08:36:44,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.212e+01 9.022e+01 9.654e+01 1.030e+02 1.274e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 08:36:47,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3814960.0, ans=0.09899494936611666 2023-11-27 08:37:11,089 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7150, loss[loss=0.0597, simple_loss=0.08756, pruned_loss=0.008344, audio_tagging_loss=0.007576, over 13894.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08971, pruned_loss=0.01192, audio_tagging_loss=0.008572, over 3036559.27 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:37:16,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3815160.0, ans=0.1 2023-11-27 08:37:29,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3815226.6666666665, ans=0.125 2023-11-27 08:37:33,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-27 08:37:38,048 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-27 08:37:40,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3815293.3333333335, ans=0.2 2023-11-27 08:37:43,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3815360.0, ans=0.05 2023-11-27 08:37:55,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-11-27 08:38:07,753 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7200, loss[loss=0.05193, simple_loss=0.06361, pruned_loss=0.009159, audio_tagging_loss=0.01097, over 15320.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09017, pruned_loss=0.01184, audio_tagging_loss=0.008627, over 3036434.44 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:38:10,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3815493.3333333335, ans=0.2 2023-11-27 08:38:13,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3815493.3333333335, ans=0.0 2023-11-27 08:38:15,446 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:38:16,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3815493.3333333335, ans=0.2 2023-11-27 08:38:16,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3815493.3333333335, ans=0.125 2023-11-27 08:38:33,737 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-27 08:38:35,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 9.027e+01 9.481e+01 1.011e+02 1.295e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 08:38:48,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3815693.3333333335, ans=0.125 2023-11-27 08:39:02,504 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7250, loss[loss=0.05459, simple_loss=0.07778, pruned_loss=0.00715, audio_tagging_loss=0.008549, over 15727.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08936, pruned_loss=0.01159, audio_tagging_loss=0.008735, over 3043216.01 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:39:11,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2023-11-27 08:39:13,263 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:39:27,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3815960.0, ans=0.0 2023-11-27 08:39:27,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-27 08:39:29,497 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-27 08:39:30,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3815960.0, ans=0.125 2023-11-27 08:39:42,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3816026.6666666665, ans=0.0 2023-11-27 08:39:48,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3816093.3333333335, ans=0.125 2023-11-27 08:39:58,252 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7300, loss[loss=0.07944, simple_loss=0.1162, pruned_loss=0.01531, audio_tagging_loss=0.006006, over 15740.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08918, pruned_loss=0.01187, audio_tagging_loss=0.00866, over 3038361.27 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:40:02,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-27 08:40:05,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3816160.0, ans=0.0 2023-11-27 08:40:06,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3816160.0, ans=0.0 2023-11-27 08:40:19,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3816226.6666666665, ans=0.125 2023-11-27 08:40:24,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-27 08:40:25,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-27 08:40:27,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.262e+01 9.740e+01 1.057e+02 1.335e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 08:40:33,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3816360.0, ans=0.125 2023-11-27 08:40:37,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3816360.0, ans=0.2 2023-11-27 08:40:54,390 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7350, loss[loss=0.07045, simple_loss=0.1049, pruned_loss=0.01079, audio_tagging_loss=0.007202, over 16856.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08876, pruned_loss=0.01169, audio_tagging_loss=0.0085, over 3047671.11 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:41:09,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3816560.0, ans=0.125 2023-11-27 08:41:20,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-27 08:41:30,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3816693.3333333335, ans=0.0 2023-11-27 08:41:49,716 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7400, loss[loss=0.0556, simple_loss=0.08026, pruned_loss=0.00834, audio_tagging_loss=0.007135, over 13688.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08785, pruned_loss=0.01164, audio_tagging_loss=0.00846, over 3043369.13 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:42:11,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-11-27 08:42:16,179 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-27 08:42:19,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.911e+01 9.063e+01 9.701e+01 1.022e+02 1.505e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 08:42:23,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3817026.6666666665, ans=0.125 2023-11-27 08:42:29,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3817026.6666666665, ans=0.125 2023-11-27 08:42:37,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-27 08:42:39,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-27 08:42:39,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-27 08:42:44,744 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7450, loss[loss=0.05207, simple_loss=0.0651, pruned_loss=0.008772, audio_tagging_loss=0.01075, over 15443.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.0888, pruned_loss=0.01184, audio_tagging_loss=0.008353, over 3048301.21 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:42:55,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3817226.6666666665, ans=0.05 2023-11-27 08:43:06,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3817226.6666666665, ans=0.0 2023-11-27 08:43:10,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.52 vs. limit=10.0 2023-11-27 08:43:12,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-27 08:43:33,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-27 08:43:39,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3817426.6666666665, ans=0.0 2023-11-27 08:43:41,242 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7500, loss[loss=0.07653, simple_loss=0.1051, pruned_loss=0.01625, audio_tagging_loss=0.007717, over 14369.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08901, pruned_loss=0.01179, audio_tagging_loss=0.008319, over 3047565.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:43:42,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3817493.3333333335, ans=0.125 2023-11-27 08:43:42,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3817493.3333333335, ans=0.0 2023-11-27 08:43:44,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3817493.3333333335, ans=0.0 2023-11-27 08:43:47,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817493.3333333335, ans=0.1 2023-11-27 08:44:05,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817626.6666666665, ans=0.1 2023-11-27 08:44:07,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-27 08:44:11,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.822e+01 9.501e+01 1.047e+02 1.367e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:44:22,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3817693.3333333335, ans=0.2 2023-11-27 08:44:37,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-11-27 08:44:37,571 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7550, loss[loss=0.05582, simple_loss=0.07857, pruned_loss=0.007538, audio_tagging_loss=0.008995, over 15219.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08917, pruned_loss=0.01183, audio_tagging_loss=0.008296, over 3044533.77 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:44:39,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3817826.6666666665, ans=0.2 2023-11-27 08:44:55,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-27 08:45:02,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3817960.0, ans=0.125 2023-11-27 08:45:03,362 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-27 08:45:06,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3817960.0, ans=0.125 2023-11-27 08:45:18,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3818026.6666666665, ans=0.025 2023-11-27 08:45:22,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3818093.3333333335, ans=0.0 2023-11-27 08:45:32,593 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7600, loss[loss=0.0585, simple_loss=0.07594, pruned_loss=0.009304, audio_tagging_loss=0.01123, over 15292.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08955, pruned_loss=0.01186, audio_tagging_loss=0.008344, over 3046605.00 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:45:35,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-27 08:45:37,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3818160.0, ans=0.0 2023-11-27 08:45:41,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3818160.0, ans=0.1 2023-11-27 08:45:51,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2023-11-27 08:45:59,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2023-11-27 08:45:59,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-27 08:46:02,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.740e+01 9.501e+01 1.030e+02 1.304e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:46:07,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3818360.0, ans=0.0 2023-11-27 08:46:20,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3818426.6666666665, ans=0.125 2023-11-27 08:46:21,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3818426.6666666665, ans=0.125 2023-11-27 08:46:28,095 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7650, loss[loss=0.06724, simple_loss=0.1003, pruned_loss=0.008735, audio_tagging_loss=0.008379, over 16385.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08885, pruned_loss=0.01153, audio_tagging_loss=0.0084, over 3045700.90 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:46:28,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3818493.3333333335, ans=0.025 2023-11-27 08:46:40,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3818560.0, ans=0.125 2023-11-27 08:46:55,253 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-27 08:47:18,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3818760.0, ans=0.0 2023-11-27 08:47:24,441 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7700, loss[loss=0.06558, simple_loss=0.08527, pruned_loss=0.0133, audio_tagging_loss=0.009647, over 14002.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08902, pruned_loss=0.01159, audio_tagging_loss=0.008459, over 3044541.70 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:47:26,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3818826.6666666665, ans=0.125 2023-11-27 08:47:33,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3818826.6666666665, ans=0.125 2023-11-27 08:47:46,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3818960.0, ans=0.0 2023-11-27 08:47:50,529 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-27 08:47:53,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 9.058e+01 9.794e+01 1.057e+02 1.473e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-27 08:48:06,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3819026.6666666665, ans=0.0 2023-11-27 08:48:11,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3819093.3333333335, ans=0.125 2023-11-27 08:48:19,686 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7750, loss[loss=0.06896, simple_loss=0.09782, pruned_loss=0.01399, audio_tagging_loss=0.006064, over 16628.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08946, pruned_loss=0.0117, audio_tagging_loss=0.008466, over 3047623.98 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:48:30,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3819226.6666666665, ans=0.2 2023-11-27 08:48:41,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3819293.3333333335, ans=0.0 2023-11-27 08:48:47,317 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-27 08:48:47,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3819293.3333333335, ans=0.1 2023-11-27 08:48:53,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2023-11-27 08:48:58,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-11-27 08:49:03,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3819426.6666666665, ans=0.125 2023-11-27 08:49:15,355 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7800, loss[loss=0.06189, simple_loss=0.0919, pruned_loss=0.008586, audio_tagging_loss=0.007352, over 15992.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08977, pruned_loss=0.01178, audio_tagging_loss=0.008486, over 3042490.17 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:49:16,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3819493.3333333335, ans=0.1 2023-11-27 08:49:42,513 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-27 08:49:45,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 9.181e+01 9.727e+01 1.046e+02 1.272e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:50:09,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3819760.0, ans=0.0 2023-11-27 08:50:11,736 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7850, loss[loss=0.07713, simple_loss=0.09876, pruned_loss=0.01834, audio_tagging_loss=0.009406, over 15854.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09025, pruned_loss=0.01205, audio_tagging_loss=0.008507, over 3046280.33 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:50:15,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3819826.6666666665, ans=0.1 2023-11-27 08:50:16,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2023-11-27 08:50:38,033 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-27 08:50:42,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3819960.0, ans=0.2 2023-11-27 08:51:07,219 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7900, loss[loss=0.0742, simple_loss=0.1052, pruned_loss=0.01513, audio_tagging_loss=0.006491, over 15313.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09052, pruned_loss=0.01208, audio_tagging_loss=0.00858, over 3047138.03 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:51:09,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3820160.0, ans=0.125 2023-11-27 08:51:14,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-27 08:51:28,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-27 08:51:34,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-27 08:51:37,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.827e+01 9.140e+01 9.856e+01 1.052e+02 1.450e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 08:51:51,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3820426.6666666665, ans=0.125 2023-11-27 08:51:54,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3820426.6666666665, ans=0.2 2023-11-27 08:52:02,800 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7950, loss[loss=0.06024, simple_loss=0.08041, pruned_loss=0.01152, audio_tagging_loss=0.008515, over 15810.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09078, pruned_loss=0.01207, audio_tagging_loss=0.008701, over 3054094.94 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:52:02,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3820493.3333333335, ans=0.05 2023-11-27 08:52:07,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3820493.3333333335, ans=0.125 2023-11-27 08:52:08,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3820493.3333333335, ans=0.0 2023-11-27 08:52:17,622 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:52:29,794 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-27 08:52:40,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3820693.3333333335, ans=0.125 2023-11-27 08:52:42,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3820693.3333333335, ans=0.125 2023-11-27 08:52:58,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-27 08:52:59,097 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8000, loss[loss=0.05603, simple_loss=0.07621, pruned_loss=0.008066, audio_tagging_loss=0.009859, over 15803.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08923, pruned_loss=0.01194, audio_tagging_loss=0.008784, over 3048854.32 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:53:07,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3820826.6666666665, ans=0.125 2023-11-27 08:53:25,488 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-27 08:53:28,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.923e+01 9.617e+01 1.018e+02 1.242e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:53:35,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3821026.6666666665, ans=0.2 2023-11-27 08:53:51,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3821093.3333333335, ans=0.125 2023-11-27 08:53:52,522 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:53:52,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-27 08:53:54,532 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8050, loss[loss=0.06136, simple_loss=0.08421, pruned_loss=0.00923, audio_tagging_loss=0.01002, over 14698.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.0887, pruned_loss=0.01186, audio_tagging_loss=0.008816, over 3048264.48 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:54:07,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3821226.6666666665, ans=0.125 2023-11-27 08:54:21,095 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-27 08:54:28,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-11-27 08:54:41,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3821426.6666666665, ans=0.125 2023-11-27 08:54:46,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-27 08:54:50,354 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8100, loss[loss=0.04669, simple_loss=0.05655, pruned_loss=0.01087, audio_tagging_loss=0.007547, over 14048.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08763, pruned_loss=0.0117, audio_tagging_loss=0.008852, over 3048573.50 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:54:54,839 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:55:14,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.10 vs. limit=10.0 2023-11-27 08:55:17,018 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-27 08:55:20,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3821626.6666666665, ans=0.125 2023-11-27 08:55:21,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 8.982e+01 9.731e+01 1.040e+02 1.240e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 08:55:34,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3821760.0, ans=0.035 2023-11-27 08:55:34,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2023-11-27 08:55:46,154 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8150, loss[loss=0.06387, simple_loss=0.08089, pruned_loss=0.01231, audio_tagging_loss=0.01112, over 14442.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08836, pruned_loss=0.0117, audio_tagging_loss=0.008739, over 3051266.58 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:55:48,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3821826.6666666665, ans=0.1 2023-11-27 08:55:52,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2023-11-27 08:56:00,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3821893.3333333335, ans=0.0 2023-11-27 08:56:13,214 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-27 08:56:21,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-27 08:56:40,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2023-11-27 08:56:41,847 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8200, loss[loss=0.07919, simple_loss=0.09373, pruned_loss=0.02173, audio_tagging_loss=0.01059, over 15323.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08781, pruned_loss=0.01157, audio_tagging_loss=0.008732, over 3051502.84 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:56:42,912 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:56:43,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=12.0 2023-11-27 08:56:50,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-27 08:56:53,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3822226.6666666665, ans=0.125 2023-11-27 08:57:02,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3822226.6666666665, ans=0.0 2023-11-27 08:57:08,866 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-27 08:57:13,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.014e+01 9.648e+01 1.048e+02 1.501e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 08:57:17,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3822360.0, ans=0.0 2023-11-27 08:57:26,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3822426.6666666665, ans=0.0 2023-11-27 08:57:38,045 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8250, loss[loss=0.06099, simple_loss=0.07888, pruned_loss=0.01094, audio_tagging_loss=0.01061, over 15817.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08923, pruned_loss=0.01169, audio_tagging_loss=0.008581, over 3053789.92 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:57:43,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3822493.3333333335, ans=0.1 2023-11-27 08:57:51,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3822560.0, ans=0.0 2023-11-27 08:57:52,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3822560.0, ans=0.125 2023-11-27 08:57:56,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3822560.0, ans=0.125 2023-11-27 08:58:04,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3822626.6666666665, ans=0.125 2023-11-27 08:58:04,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-27 08:58:21,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3822693.3333333335, ans=0.125 2023-11-27 08:58:22,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3822760.0, ans=0.09899494936611666 2023-11-27 08:58:34,676 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8300, loss[loss=0.06466, simple_loss=0.08489, pruned_loss=0.01295, audio_tagging_loss=0.009258, over 15142.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08888, pruned_loss=0.01175, audio_tagging_loss=0.008595, over 3048311.24 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:58:39,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3822826.6666666665, ans=0.1 2023-11-27 08:58:47,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2023-11-27 08:58:53,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3822893.3333333335, ans=0.2 2023-11-27 08:59:01,356 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-27 08:59:05,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.971e+01 9.543e+01 1.035e+02 1.385e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 08:59:06,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3823026.6666666665, ans=0.05 2023-11-27 08:59:21,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2023-11-27 08:59:30,193 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8350, loss[loss=0.05952, simple_loss=0.0832, pruned_loss=0.0102, audio_tagging_loss=0.007714, over 15643.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08822, pruned_loss=0.01162, audio_tagging_loss=0.00856, over 3045997.33 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:59:34,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3823160.0, ans=0.0 2023-11-27 08:59:41,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3823226.6666666665, ans=0.0 2023-11-27 08:59:56,898 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-27 09:00:05,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3823360.0, ans=0.0 2023-11-27 09:00:13,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3823426.6666666665, ans=0.1 2023-11-27 09:00:22,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3823426.6666666665, ans=0.1 2023-11-27 09:00:25,932 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8400, loss[loss=0.09029, simple_loss=0.1254, pruned_loss=0.02075, audio_tagging_loss=0.006858, over 16411.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08863, pruned_loss=0.01172, audio_tagging_loss=0.008569, over 3042757.60 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:00:27,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3823493.3333333335, ans=0.125 2023-11-27 09:00:35,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3823493.3333333335, ans=0.125 2023-11-27 09:00:39,221 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:00:52,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-27 09:00:56,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.941e+01 9.645e+01 1.032e+02 1.251e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:01:08,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3823693.3333333335, ans=0.2 2023-11-27 09:01:21,027 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8450, loss[loss=0.07463, simple_loss=0.1071, pruned_loss=0.0141, audio_tagging_loss=0.006962, over 14964.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08914, pruned_loss=0.01183, audio_tagging_loss=0.008576, over 3047339.43 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:01:23,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3823826.6666666665, ans=0.05 2023-11-27 09:01:47,444 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-27 09:01:59,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3824026.6666666665, ans=0.2 2023-11-27 09:02:16,961 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8500, loss[loss=0.06835, simple_loss=0.0862, pruned_loss=0.01854, audio_tagging_loss=0.006707, over 14667.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08938, pruned_loss=0.01182, audio_tagging_loss=0.008475, over 3041524.97 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:02:42,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3824293.3333333335, ans=0.0 2023-11-27 09:02:43,596 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-27 09:02:46,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-27 09:02:48,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 9.075e+01 9.563e+01 1.041e+02 1.324e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 09:02:52,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-11-27 09:02:57,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3824360.0, ans=0.125 2023-11-27 09:03:12,106 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8550, loss[loss=0.06089, simple_loss=0.08455, pruned_loss=0.01014, audio_tagging_loss=0.008482, over 14833.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.09024, pruned_loss=0.01189, audio_tagging_loss=0.008498, over 3058895.19 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:03:18,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3824493.3333333335, ans=0.0 2023-11-27 09:03:35,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3824626.6666666665, ans=0.125 2023-11-27 09:03:36,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.18 vs. limit=10.0 2023-11-27 09:03:39,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-27 09:03:54,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3824693.3333333335, ans=0.125 2023-11-27 09:04:00,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2023-11-27 09:04:08,392 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8600, loss[loss=0.0643, simple_loss=0.08323, pruned_loss=0.01214, audio_tagging_loss=0.01054, over 16115.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09016, pruned_loss=0.01194, audio_tagging_loss=0.008535, over 3060107.03 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:04:32,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3824960.0, ans=0.125 2023-11-27 09:04:34,958 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-27 09:04:39,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.143e+01 9.907e+01 1.055e+02 1.409e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-27 09:04:41,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3825026.6666666665, ans=0.1 2023-11-27 09:04:46,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3825026.6666666665, ans=0.2 2023-11-27 09:04:49,441 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:05:00,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3825093.3333333335, ans=0.1 2023-11-27 09:05:01,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3825093.3333333335, ans=0.1 2023-11-27 09:05:04,579 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8650, loss[loss=0.0614, simple_loss=0.07935, pruned_loss=0.01241, audio_tagging_loss=0.00931, over 15646.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.09002, pruned_loss=0.012, audio_tagging_loss=0.008555, over 3059543.96 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:05:12,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3825160.0, ans=0.2 2023-11-27 09:05:15,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3825226.6666666665, ans=0.0 2023-11-27 09:05:15,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3825226.6666666665, ans=10.0 2023-11-27 09:05:18,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3825226.6666666665, ans=0.0 2023-11-27 09:05:30,637 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-27 09:05:47,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3825360.0, ans=0.125 2023-11-27 09:05:49,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2023-11-27 09:06:00,047 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8700, loss[loss=0.06005, simple_loss=0.07268, pruned_loss=0.01399, audio_tagging_loss=0.009715, over 14362.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08953, pruned_loss=0.01205, audio_tagging_loss=0.008689, over 3057187.06 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:06:27,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-27 09:06:32,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3825626.6666666665, ans=0.125 2023-11-27 09:06:32,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.211e+01 9.788e+01 1.039e+02 1.317e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 09:06:42,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3825693.3333333335, ans=0.125 2023-11-27 09:06:44,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=15.0 2023-11-27 09:06:45,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825760.0, ans=0.1 2023-11-27 09:06:55,783 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8750, loss[loss=0.0751, simple_loss=0.1039, pruned_loss=0.01647, audio_tagging_loss=0.00666, over 14334.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08975, pruned_loss=0.0121, audio_tagging_loss=0.00872, over 3048029.85 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:06:56,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.39 vs. limit=15.0 2023-11-27 09:07:15,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3825893.3333333335, ans=0.0 2023-11-27 09:07:18,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3825960.0, ans=0.125 2023-11-27 09:07:22,901 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-27 09:07:36,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3826026.6666666665, ans=0.125 2023-11-27 09:07:50,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826093.3333333335, ans=0.1 2023-11-27 09:07:52,414 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8800, loss[loss=0.0584, simple_loss=0.07428, pruned_loss=0.009972, audio_tagging_loss=0.01129, over 15241.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09016, pruned_loss=0.01227, audio_tagging_loss=0.008789, over 3040832.19 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:08:13,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3826293.3333333335, ans=0.1 2023-11-27 09:08:14,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3826293.3333333335, ans=0.0 2023-11-27 09:08:14,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-27 09:08:18,405 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-27 09:08:23,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.347e+01 1.002e+02 1.077e+02 1.340e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-27 09:08:23,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3826360.0, ans=0.125 2023-11-27 09:08:47,537 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8850, loss[loss=0.06629, simple_loss=0.09258, pruned_loss=0.01148, audio_tagging_loss=0.008521, over 15474.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09043, pruned_loss=0.01233, audio_tagging_loss=0.008794, over 3036625.77 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:08:58,679 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:09:00,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3826560.0, ans=0.125 2023-11-27 09:09:13,959 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-27 09:09:29,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-27 09:09:29,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-27 09:09:32,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3826760.0, ans=0.125 2023-11-27 09:09:42,719 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8900, loss[loss=0.05667, simple_loss=0.07794, pruned_loss=0.01039, audio_tagging_loss=0.007311, over 15674.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09026, pruned_loss=0.01221, audio_tagging_loss=0.008618, over 3042768.10 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:09:56,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3826893.3333333335, ans=0.0 2023-11-27 09:10:02,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3826893.3333333335, ans=0.125 2023-11-27 09:10:04,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3826960.0, ans=0.125 2023-11-27 09:10:08,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3826960.0, ans=0.125 2023-11-27 09:10:09,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3826960.0, ans=0.125 2023-11-27 09:10:09,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-27 09:10:14,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3826960.0, ans=0.125 2023-11-27 09:10:16,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.024e+01 9.616e+01 1.025e+02 1.217e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 09:10:30,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3827093.3333333335, ans=0.0 2023-11-27 09:10:31,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3827093.3333333335, ans=0.0 2023-11-27 09:10:38,544 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8950, loss[loss=0.04507, simple_loss=0.06037, pruned_loss=0.006611, audio_tagging_loss=0.008273, over 16316.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09011, pruned_loss=0.01222, audio_tagging_loss=0.008427, over 3043464.12 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:10:46,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-27 09:10:47,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3827160.0, ans=0.125 2023-11-27 09:10:49,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3827226.6666666665, ans=0.1 2023-11-27 09:10:51,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3827226.6666666665, ans=0.04949747468305833 2023-11-27 09:11:03,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3827293.3333333335, ans=0.2 2023-11-27 09:11:05,622 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-27 09:11:12,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827360.0, ans=0.1 2023-11-27 09:11:29,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3827426.6666666665, ans=0.2 2023-11-27 09:11:34,688 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9000, loss[loss=0.05799, simple_loss=0.0796, pruned_loss=0.008537, audio_tagging_loss=0.009655, over 16696.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09064, pruned_loss=0.01227, audio_tagging_loss=0.008416, over 3043881.26 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:11:34,689 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 09:11:52,557 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8210, 5.8567, 5.8831, 5.8789], device='cuda:1') 2023-11-27 09:12:07,533 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05893, simple_loss=0.05035, pruned_loss=0.005253, audio_tagging_loss=0.0285, over 4681554.00 frames. 2023-11-27 09:12:07,533 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 09:12:09,928 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:12:30,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3827626.6666666665, ans=0.125 2023-11-27 09:12:33,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3827626.6666666665, ans=0.5 2023-11-27 09:12:34,562 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-27 09:12:40,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.087e+01 9.718e+01 1.070e+02 1.602e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 09:12:44,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3827693.3333333335, ans=0.125 2023-11-27 09:12:46,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3827693.3333333335, ans=0.125 2023-11-27 09:13:03,709 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9050, loss[loss=0.08075, simple_loss=0.1073, pruned_loss=0.02006, audio_tagging_loss=0.007026, over 14873.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09042, pruned_loss=0.01223, audio_tagging_loss=0.008326, over 3034516.76 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:13:30,162 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-27 09:13:32,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3827960.0, ans=0.125 2023-11-27 09:13:37,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-27 09:13:37,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-11-27 09:13:38,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2023-11-27 09:13:59,504 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9100, loss[loss=0.058, simple_loss=0.08224, pruned_loss=0.006631, audio_tagging_loss=0.01024, over 16253.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08988, pruned_loss=0.01199, audio_tagging_loss=0.008333, over 3043762.24 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:14:04,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3828160.0, ans=0.0 2023-11-27 09:14:11,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3828226.6666666665, ans=0.125 2023-11-27 09:14:15,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3828226.6666666665, ans=0.125 2023-11-27 09:14:26,657 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-27 09:14:26,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3828293.3333333335, ans=0.0 2023-11-27 09:14:32,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.030e+01 9.534e+01 1.010e+02 1.225e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 09:14:39,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.52 vs. limit=10.0 2023-11-27 09:14:55,586 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9150, loss[loss=0.04989, simple_loss=0.06417, pruned_loss=0.008702, audio_tagging_loss=0.009108, over 14711.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08943, pruned_loss=0.01203, audio_tagging_loss=0.008429, over 3047405.34 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:15:00,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3828493.3333333335, ans=0.125 2023-11-27 09:15:14,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2023-11-27 09:15:22,662 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-27 09:15:34,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3828693.3333333335, ans=0.0 2023-11-27 09:15:42,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2023-11-27 09:15:51,840 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9200, loss[loss=0.04072, simple_loss=0.05058, pruned_loss=0.007138, audio_tagging_loss=0.008299, over 13636.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08941, pruned_loss=0.01219, audio_tagging_loss=0.008417, over 3046021.14 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:16:04,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-27 09:16:11,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3828893.3333333335, ans=0.0 2023-11-27 09:16:16,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3828960.0, ans=0.04949747468305833 2023-11-27 09:16:18,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-27 09:16:22,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3828960.0, ans=0.125 2023-11-27 09:16:24,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3829026.6666666665, ans=0.07 2023-11-27 09:16:24,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.042e+01 9.589e+01 1.020e+02 1.357e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 09:16:35,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3829093.3333333335, ans=0.035 2023-11-27 09:16:47,644 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9250, loss[loss=0.07359, simple_loss=0.09709, pruned_loss=0.01844, audio_tagging_loss=0.006605, over 15259.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08886, pruned_loss=0.01217, audio_tagging_loss=0.008539, over 3050771.17 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:16:55,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3829160.0, ans=0.125 2023-11-27 09:17:14,306 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-27 09:17:18,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3829293.3333333335, ans=0.1 2023-11-27 09:17:29,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3829360.0, ans=0.1 2023-11-27 09:17:43,315 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9300, loss[loss=0.05531, simple_loss=0.07806, pruned_loss=0.007682, audio_tagging_loss=0.008593, over 14813.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08903, pruned_loss=0.01217, audio_tagging_loss=0.008578, over 3049169.72 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:18:00,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2023-11-27 09:18:01,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3829560.0, ans=0.2 2023-11-27 09:18:09,933 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-27 09:18:16,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.342e+01 9.838e+01 1.063e+02 1.386e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 09:18:22,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3829693.3333333335, ans=0.125 2023-11-27 09:18:29,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3829760.0, ans=0.035 2023-11-27 09:18:38,916 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9350, loss[loss=0.05396, simple_loss=0.07529, pruned_loss=0.009268, audio_tagging_loss=0.007045, over 15068.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08884, pruned_loss=0.01198, audio_tagging_loss=0.008554, over 3056888.40 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:18:43,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-27 09:18:43,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3829826.6666666665, ans=0.0 2023-11-27 09:18:43,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3829826.6666666665, ans=0.5 2023-11-27 09:18:48,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-27 09:18:55,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3829893.3333333335, ans=0.125 2023-11-27 09:19:05,629 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-27 09:19:26,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3830093.3333333335, ans=0.09899494936611666 2023-11-27 09:19:34,654 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9400, loss[loss=0.06584, simple_loss=0.09015, pruned_loss=0.01377, audio_tagging_loss=0.007001, over 15202.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08861, pruned_loss=0.01204, audio_tagging_loss=0.008581, over 3049521.10 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:19:34,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3830160.0, ans=0.2 2023-11-27 09:19:37,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3830160.0, ans=0.0 2023-11-27 09:19:47,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2023-11-27 09:19:56,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3830293.3333333335, ans=0.1 2023-11-27 09:20:01,306 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-27 09:20:05,127 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:20:09,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 8.905e+01 9.680e+01 1.031e+02 1.220e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 09:20:12,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-11-27 09:20:26,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-11-27 09:20:29,220 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:20:30,794 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9450, loss[loss=0.07117, simple_loss=0.09731, pruned_loss=0.01429, audio_tagging_loss=0.008219, over 15016.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08829, pruned_loss=0.01199, audio_tagging_loss=0.008707, over 3048350.06 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:20:32,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3830493.3333333335, ans=0.0 2023-11-27 09:20:38,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3830493.3333333335, ans=0.125 2023-11-27 09:20:45,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3830560.0, ans=0.2 2023-11-27 09:20:48,503 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:20:53,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3830626.6666666665, ans=0.1 2023-11-27 09:20:54,346 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:20:57,561 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-27 09:21:00,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3830626.6666666665, ans=0.0 2023-11-27 09:21:01,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3830626.6666666665, ans=0.2 2023-11-27 09:21:05,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3830693.3333333335, ans=0.5 2023-11-27 09:21:21,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3830760.0, ans=0.0 2023-11-27 09:21:26,373 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9500, loss[loss=0.0609, simple_loss=0.09073, pruned_loss=0.008173, audio_tagging_loss=0.00736, over 16047.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08861, pruned_loss=0.01197, audio_tagging_loss=0.008681, over 3050478.97 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:21:28,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3830826.6666666665, ans=0.0 2023-11-27 09:21:49,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3830960.0, ans=0.125 2023-11-27 09:21:52,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-27 09:21:59,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3831026.6666666665, ans=0.125 2023-11-27 09:22:01,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.168e+01 9.748e+01 1.058e+02 1.599e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 09:22:05,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3831026.6666666665, ans=0.125 2023-11-27 09:22:12,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3831093.3333333335, ans=0.2 2023-11-27 09:22:19,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3831093.3333333335, ans=0.125 2023-11-27 09:22:22,112 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9550, loss[loss=0.07926, simple_loss=0.1058, pruned_loss=0.01911, audio_tagging_loss=0.00724, over 14408.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08906, pruned_loss=0.01218, audio_tagging_loss=0.008716, over 3044403.60 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:22:31,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3831226.6666666665, ans=0.0 2023-11-27 09:22:33,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3831226.6666666665, ans=0.125 2023-11-27 09:22:34,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-27 09:22:39,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3831226.6666666665, ans=0.125 2023-11-27 09:22:44,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3831293.3333333335, ans=0.2 2023-11-27 09:22:46,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3831293.3333333335, ans=0.2 2023-11-27 09:22:48,526 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-27 09:22:49,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831293.3333333335, ans=0.1 2023-11-27 09:23:12,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3831426.6666666665, ans=0.1 2023-11-27 09:23:13,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2023-11-27 09:23:13,833 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:23:16,910 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9600, loss[loss=0.0765, simple_loss=0.1033, pruned_loss=0.01502, audio_tagging_loss=0.009829, over 14412.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08936, pruned_loss=0.01215, audio_tagging_loss=0.008759, over 3047353.04 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:23:21,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3831493.3333333335, ans=0.2 2023-11-27 09:23:24,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831493.3333333335, ans=0.1 2023-11-27 09:23:40,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3831626.6666666665, ans=0.125 2023-11-27 09:23:44,176 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-27 09:23:51,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 9.086e+01 9.692e+01 1.047e+02 1.227e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 09:23:52,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-27 09:24:08,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831760.0, ans=0.1 2023-11-27 09:24:12,900 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9650, loss[loss=0.06926, simple_loss=0.1028, pruned_loss=0.009047, audio_tagging_loss=0.0088, over 14994.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08908, pruned_loss=0.01195, audio_tagging_loss=0.008805, over 3051446.82 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:24:18,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3831826.6666666665, ans=0.125 2023-11-27 09:24:19,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=15.0 2023-11-27 09:24:28,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3831893.3333333335, ans=0.125 2023-11-27 09:24:37,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3831960.0, ans=0.125 2023-11-27 09:24:39,541 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-27 09:24:43,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3831960.0, ans=0.0 2023-11-27 09:24:48,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3832026.6666666665, ans=0.2 2023-11-27 09:25:00,872 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:25:09,698 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9700, loss[loss=0.06152, simple_loss=0.0793, pruned_loss=0.01392, audio_tagging_loss=0.007946, over 15187.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.0891, pruned_loss=0.01204, audio_tagging_loss=0.00865, over 3047648.67 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:25:16,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3832160.0, ans=0.1 2023-11-27 09:25:36,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-27 09:25:44,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 9.066e+01 9.770e+01 1.059e+02 1.296e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 09:25:46,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2023-11-27 09:25:52,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3832360.0, ans=0.1 2023-11-27 09:25:59,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3832426.6666666665, ans=0.09899494936611666 2023-11-27 09:26:02,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=22.5 2023-11-27 09:26:05,285 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9750, loss[loss=0.0584, simple_loss=0.07715, pruned_loss=0.009177, audio_tagging_loss=0.01065, over 14728.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08908, pruned_loss=0.01203, audio_tagging_loss=0.008607, over 3043540.12 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:26:07,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3832493.3333333335, ans=0.125 2023-11-27 09:26:19,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3832560.0, ans=0.125 2023-11-27 09:26:29,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3832626.6666666665, ans=0.125 2023-11-27 09:26:32,887 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-27 09:26:44,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2023-11-27 09:26:52,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3832760.0, ans=0.1 2023-11-27 09:27:01,003 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9800, loss[loss=0.06254, simple_loss=0.08169, pruned_loss=0.01088, audio_tagging_loss=0.01082, over 15107.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08941, pruned_loss=0.01205, audio_tagging_loss=0.008513, over 3037243.53 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:27:08,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-27 09:27:08,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3832826.6666666665, ans=0.125 2023-11-27 09:27:19,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3832893.3333333335, ans=0.0 2023-11-27 09:27:19,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3832893.3333333335, ans=0.2 2023-11-27 09:27:28,127 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-27 09:27:36,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.044e+01 9.762e+01 1.048e+02 1.288e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 09:27:36,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3833026.6666666665, ans=0.125 2023-11-27 09:27:42,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3833026.6666666665, ans=0.125 2023-11-27 09:27:44,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3833093.3333333335, ans=0.125 2023-11-27 09:27:51,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3833093.3333333335, ans=0.125 2023-11-27 09:27:51,910 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:27:57,774 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9850, loss[loss=0.05839, simple_loss=0.07765, pruned_loss=0.009109, audio_tagging_loss=0.01046, over 14374.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08951, pruned_loss=0.012, audio_tagging_loss=0.008455, over 3045266.02 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:28:00,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-27 09:28:23,740 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-27 09:28:39,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3833360.0, ans=0.0 2023-11-27 09:28:46,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-27 09:28:47,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3833426.6666666665, ans=0.125 2023-11-27 09:28:51,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=8.0 2023-11-27 09:28:53,336 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9900, loss[loss=0.062, simple_loss=0.08712, pruned_loss=0.009912, audio_tagging_loss=0.008523, over 15294.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08998, pruned_loss=0.01196, audio_tagging_loss=0.008399, over 3050803.02 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:29:01,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3833493.3333333335, ans=0.125 2023-11-27 09:29:09,044 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:29:11,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-27 09:29:15,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3833626.6666666665, ans=0.0 2023-11-27 09:29:19,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3833626.6666666665, ans=0.2 2023-11-27 09:29:19,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3833626.6666666665, ans=0.125 2023-11-27 09:29:21,160 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-27 09:29:26,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3833693.3333333335, ans=0.0 2023-11-27 09:29:28,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.072e+01 9.748e+01 1.058e+02 2.513e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-27 09:29:30,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833693.3333333335, ans=0.1 2023-11-27 09:29:30,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3833693.3333333335, ans=0.0 2023-11-27 09:29:34,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3833693.3333333335, ans=0.125 2023-11-27 09:29:36,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2023-11-27 09:29:45,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-27 09:29:49,342 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9950, loss[loss=0.0633, simple_loss=0.08353, pruned_loss=0.01163, audio_tagging_loss=0.009901, over 14792.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09047, pruned_loss=0.01216, audio_tagging_loss=0.008353, over 3049343.50 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:30:15,995 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-27 09:30:35,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.62 vs. limit=15.0 2023-11-27 09:30:36,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3834093.3333333335, ans=0.125 2023-11-27 09:30:45,629 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10000, loss[loss=0.06602, simple_loss=0.08366, pruned_loss=0.0148, audio_tagging_loss=0.009388, over 16080.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09043, pruned_loss=0.01227, audio_tagging_loss=0.008347, over 3047206.03 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:31:00,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834226.6666666665, ans=0.1 2023-11-27 09:31:11,721 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-27 09:31:12,855 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:31:16,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3834293.3333333335, ans=0.125 2023-11-27 09:31:21,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.035e+01 9.520e+01 1.022e+02 1.313e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 09:31:29,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3834426.6666666665, ans=0.2 2023-11-27 09:31:40,777 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10050, loss[loss=0.07939, simple_loss=0.1141, pruned_loss=0.01527, audio_tagging_loss=0.007091, over 14161.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08972, pruned_loss=0.01211, audio_tagging_loss=0.00848, over 3039584.95 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:31:43,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=12.0 2023-11-27 09:31:47,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-27 09:31:49,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-11-27 09:32:07,304 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-27 09:32:10,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.67 vs. limit=10.0 2023-11-27 09:32:13,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3834693.3333333335, ans=0.0 2023-11-27 09:32:24,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3834760.0, ans=0.0 2023-11-27 09:32:31,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3834760.0, ans=0.2 2023-11-27 09:32:36,498 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10100, loss[loss=0.08554, simple_loss=0.1263, pruned_loss=0.01609, audio_tagging_loss=0.006296, over 15311.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08992, pruned_loss=0.01195, audio_tagging_loss=0.00843, over 3042072.30 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:32:44,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-27 09:32:54,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3834893.3333333335, ans=0.125 2023-11-27 09:32:56,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3834893.3333333335, ans=0.125 2023-11-27 09:32:59,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3834960.0, ans=0.125 2023-11-27 09:33:03,236 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-27 09:33:12,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.969e+01 9.511e+01 1.051e+02 1.642e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 09:33:21,671 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:22,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3835093.3333333335, ans=0.1 2023-11-27 09:33:28,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3835093.3333333335, ans=0.05 2023-11-27 09:33:31,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3835160.0, ans=0.04949747468305833 2023-11-27 09:33:31,813 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10150, loss[loss=0.05689, simple_loss=0.07837, pruned_loss=0.008392, audio_tagging_loss=0.009312, over 15762.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0897, pruned_loss=0.01199, audio_tagging_loss=0.008463, over 3045406.85 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:33:42,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3835226.6666666665, ans=0.125 2023-11-27 09:33:59,247 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:59,292 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-27 09:34:04,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-11-27 09:34:28,271 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10200, loss[loss=0.05386, simple_loss=0.07201, pruned_loss=0.007767, audio_tagging_loss=0.01009, over 14680.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08904, pruned_loss=0.01187, audio_tagging_loss=0.008621, over 3052836.63 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:34:35,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3835493.3333333335, ans=0.0 2023-11-27 09:34:38,957 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:34:44,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3835560.0, ans=0.0 2023-11-27 09:34:47,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=3835560.0, ans=12.0 2023-11-27 09:34:48,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-27 09:34:48,931 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:34:54,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3835626.6666666665, ans=0.0 2023-11-27 09:34:54,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-27 09:34:59,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3835626.6666666665, ans=0.2 2023-11-27 09:35:06,658 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 9.149e+01 9.728e+01 1.044e+02 1.552e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 09:35:15,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3835760.0, ans=0.95 2023-11-27 09:35:24,030 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10250, loss[loss=0.0809, simple_loss=0.1013, pruned_loss=0.01934, audio_tagging_loss=0.0109, over 15025.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08986, pruned_loss=0.01195, audio_tagging_loss=0.008613, over 3058404.95 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:35:38,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3835893.3333333335, ans=15.0 2023-11-27 09:35:50,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=10.0 2023-11-27 09:35:51,197 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-27 09:36:03,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3836026.6666666665, ans=15.0 2023-11-27 09:36:19,991 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10300, loss[loss=0.07669, simple_loss=0.111, pruned_loss=0.01415, audio_tagging_loss=0.007031, over 15680.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09031, pruned_loss=0.01201, audio_tagging_loss=0.008594, over 3051040.86 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:36:23,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3836160.0, ans=0.125 2023-11-27 09:36:33,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3836226.6666666665, ans=0.125 2023-11-27 09:36:45,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3836293.3333333335, ans=0.125 2023-11-27 09:36:46,349 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-27 09:36:57,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.883e+01 9.810e+01 1.061e+02 1.854e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 09:37:16,037 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10350, loss[loss=0.0614, simple_loss=0.08603, pruned_loss=0.01006, audio_tagging_loss=0.008324, over 16796.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08936, pruned_loss=0.01189, audio_tagging_loss=0.008708, over 3051258.60 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:37:42,140 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-27 09:37:44,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3836626.6666666665, ans=0.125 2023-11-27 09:37:54,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3836693.3333333335, ans=0.125 2023-11-27 09:38:02,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3836760.0, ans=0.125 2023-11-27 09:38:06,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2023-11-27 09:38:11,100 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10400, loss[loss=0.04376, simple_loss=0.05929, pruned_loss=0.004831, audio_tagging_loss=0.009288, over 14479.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08909, pruned_loss=0.01182, audio_tagging_loss=0.008778, over 3047356.32 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:38:18,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.93 vs. limit=10.0 2023-11-27 09:38:38,224 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-27 09:38:39,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2023-11-27 09:38:49,402 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.167e+01 9.734e+01 1.047e+02 2.020e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 09:38:55,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837093.3333333335, ans=0.1 2023-11-27 09:39:02,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3837093.3333333335, ans=0.0 2023-11-27 09:39:06,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837160.0, ans=0.1 2023-11-27 09:39:06,810 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10450, loss[loss=0.0683, simple_loss=0.0904, pruned_loss=0.01486, audio_tagging_loss=0.008242, over 14692.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08889, pruned_loss=0.01178, audio_tagging_loss=0.008739, over 3050534.97 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:39:19,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=15.0 2023-11-27 09:39:29,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3837293.3333333335, ans=0.0 2023-11-27 09:39:33,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-27 09:39:35,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837293.3333333335, ans=0.1 2023-11-27 09:40:03,169 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10500, loss[loss=0.09522, simple_loss=0.1326, pruned_loss=0.02339, audio_tagging_loss=0.005536, over 14917.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08955, pruned_loss=0.01198, audio_tagging_loss=0.008556, over 3044469.02 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:40:05,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3837493.3333333335, ans=0.0 2023-11-27 09:40:10,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837493.3333333335, ans=0.1 2023-11-27 09:40:20,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3837560.0, ans=0.1 2023-11-27 09:40:29,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-27 09:40:41,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.939e+01 9.707e+01 1.019e+02 1.510e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 09:40:41,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3837693.3333333335, ans=0.0 2023-11-27 09:40:58,841 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10550, loss[loss=0.06902, simple_loss=0.1057, pruned_loss=0.008733, audio_tagging_loss=0.007432, over 15760.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08925, pruned_loss=0.01197, audio_tagging_loss=0.008442, over 3036139.72 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:41:04,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3837826.6666666665, ans=0.125 2023-11-27 09:41:08,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3837893.3333333335, ans=0.125 2023-11-27 09:41:16,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837893.3333333335, ans=0.1 2023-11-27 09:41:17,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837893.3333333335, ans=0.1 2023-11-27 09:41:23,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-27 09:41:25,793 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-27 09:41:35,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3838026.6666666665, ans=0.125 2023-11-27 09:41:36,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3838026.6666666665, ans=0.0 2023-11-27 09:41:50,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2023-11-27 09:41:54,064 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10600, loss[loss=0.05868, simple_loss=0.079, pruned_loss=0.009305, audio_tagging_loss=0.009882, over 15165.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08926, pruned_loss=0.012, audio_tagging_loss=0.008339, over 3038063.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:42:09,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2023-11-27 09:42:20,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-27 09:42:31,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.017e+01 9.530e+01 1.017e+02 1.584e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 09:42:49,581 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10650, loss[loss=0.06921, simple_loss=0.09979, pruned_loss=0.01295, audio_tagging_loss=0.006361, over 15458.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08946, pruned_loss=0.01203, audio_tagging_loss=0.008381, over 3045277.05 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:42:51,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3838493.3333333335, ans=0.09899494936611666 2023-11-27 09:43:13,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-27 09:43:15,915 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-27 09:43:29,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3838693.3333333335, ans=0.125 2023-11-27 09:43:30,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3838693.3333333335, ans=0.125 2023-11-27 09:43:44,273 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10700, loss[loss=0.06251, simple_loss=0.08935, pruned_loss=0.01016, audio_tagging_loss=0.007666, over 14788.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.09001, pruned_loss=0.01204, audio_tagging_loss=0.0084, over 3046372.42 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:44:11,400 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-27 09:44:14,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2023-11-27 09:44:15,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-27 09:44:15,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3838960.0, ans=0.1 2023-11-27 09:44:21,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3839026.6666666665, ans=0.125 2023-11-27 09:44:21,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.303e+01 9.868e+01 1.046e+02 1.253e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 09:44:30,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3839093.3333333335, ans=0.125 2023-11-27 09:44:30,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3839093.3333333335, ans=0.125 2023-11-27 09:44:32,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3839093.3333333335, ans=0.0 2023-11-27 09:44:39,736 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10750, loss[loss=0.05572, simple_loss=0.08691, pruned_loss=0.007127, audio_tagging_loss=0.005139, over 14756.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0893, pruned_loss=0.01192, audio_tagging_loss=0.008422, over 3053671.38 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:44:43,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3839160.0, ans=0.2 2023-11-27 09:44:49,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3839226.6666666665, ans=0.125 2023-11-27 09:44:59,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3839226.6666666665, ans=0.125 2023-11-27 09:44:59,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2023-11-27 09:45:05,855 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-27 09:45:29,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-11-27 09:45:30,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3839426.6666666665, ans=0.0 2023-11-27 09:45:34,368 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10800, loss[loss=0.07109, simple_loss=0.09666, pruned_loss=0.01505, audio_tagging_loss=0.007717, over 15121.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08874, pruned_loss=0.01171, audio_tagging_loss=0.008397, over 3052787.42 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:45:51,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3839560.0, ans=0.125 2023-11-27 09:46:00,520 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-27 09:46:11,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 8.958e+01 9.647e+01 1.051e+02 1.313e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:46:21,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3839760.0, ans=0.125 2023-11-27 09:46:23,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3839760.0, ans=0.0 2023-11-27 09:46:26,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3839760.0, ans=0.125 2023-11-27 09:46:28,741 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10850, loss[loss=0.06629, simple_loss=0.09045, pruned_loss=0.01237, audio_tagging_loss=0.008694, over 15875.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08896, pruned_loss=0.01194, audio_tagging_loss=0.008477, over 3052240.67 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:46:47,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-27 09:46:53,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3839960.0, ans=0.1 2023-11-27 09:46:55,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-27 09:47:22,921 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:47:26,025 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10900, loss[loss=0.06295, simple_loss=0.09747, pruned_loss=0.00821, audio_tagging_loss=0.006006, over 15596.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08989, pruned_loss=0.01208, audio_tagging_loss=0.00852, over 3053435.24 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:47:50,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3840293.3333333335, ans=0.0 2023-11-27 09:47:52,300 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-27 09:48:03,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.173e+01 9.584e+01 1.016e+02 1.234e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 09:48:05,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3840360.0, ans=0.0 2023-11-27 09:48:15,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3840426.6666666665, ans=0.2 2023-11-27 09:48:21,470 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10950, loss[loss=0.05896, simple_loss=0.0837, pruned_loss=0.0083, audio_tagging_loss=0.00881, over 15661.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09025, pruned_loss=0.01209, audio_tagging_loss=0.008567, over 3049690.59 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:48:24,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3840493.3333333335, ans=0.1 2023-11-27 09:48:47,712 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-27 09:48:56,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3840693.3333333335, ans=0.125 2023-11-27 09:49:02,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3840693.3333333335, ans=0.125 2023-11-27 09:49:03,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3840693.3333333335, ans=0.0 2023-11-27 09:49:05,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.24 vs. limit=5.0 2023-11-27 09:49:15,867 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11000, loss[loss=0.05238, simple_loss=0.06647, pruned_loss=0.007767, audio_tagging_loss=0.01138, over 16398.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.0892, pruned_loss=0.01179, audio_tagging_loss=0.008656, over 3049804.06 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:49:24,254 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:49:34,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3840893.3333333335, ans=0.125 2023-11-27 09:49:40,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-27 09:49:41,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3840960.0, ans=0.2 2023-11-27 09:49:42,385 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-27 09:49:53,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.909e+01 9.397e+01 1.014e+02 1.657e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 09:49:57,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3841026.6666666665, ans=0.0 2023-11-27 09:50:04,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-27 09:50:04,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3841093.3333333335, ans=0.125 2023-11-27 09:50:04,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3841093.3333333335, ans=0.0 2023-11-27 09:50:10,562 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11050, loss[loss=0.07362, simple_loss=0.1143, pruned_loss=0.01086, audio_tagging_loss=0.005618, over 16579.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08963, pruned_loss=0.01188, audio_tagging_loss=0.008625, over 3056082.62 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:50:17,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3841160.0, ans=0.09899494936611666 2023-11-27 09:50:19,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3841160.0, ans=0.125 2023-11-27 09:50:37,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-27 09:50:40,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2023-11-27 09:50:40,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3841293.3333333335, ans=0.0 2023-11-27 09:50:50,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3841360.0, ans=0.0 2023-11-27 09:51:05,941 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11100, loss[loss=0.06265, simple_loss=0.08924, pruned_loss=0.01148, audio_tagging_loss=0.006543, over 13392.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08909, pruned_loss=0.01183, audio_tagging_loss=0.008814, over 3055734.34 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:51:10,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3841493.3333333335, ans=0.0 2023-11-27 09:51:23,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3841560.0, ans=0.125 2023-11-27 09:51:29,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2023-11-27 09:51:31,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.39 vs. limit=10.0 2023-11-27 09:51:32,096 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-27 09:51:36,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3841626.6666666665, ans=0.0 2023-11-27 09:51:40,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-27 09:51:44,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.157e+01 9.860e+01 1.054e+02 1.486e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-27 09:51:49,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3841760.0, ans=0.125 2023-11-27 09:51:50,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3841760.0, ans=0.0 2023-11-27 09:52:00,851 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11150, loss[loss=0.06923, simple_loss=0.08922, pruned_loss=0.01633, audio_tagging_loss=0.008301, over 14764.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08841, pruned_loss=0.01174, audio_tagging_loss=0.008944, over 3051273.96 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:52:01,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.66 vs. limit=15.0 2023-11-27 09:52:04,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3841826.6666666665, ans=0.125 2023-11-27 09:52:17,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.09 vs. limit=10.0 2023-11-27 09:52:18,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3841893.3333333335, ans=0.125 2023-11-27 09:52:24,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3841960.0, ans=0.0 2023-11-27 09:52:27,405 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-27 09:52:27,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=12.0 2023-11-27 09:52:41,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3842026.6666666665, ans=0.125 2023-11-27 09:52:42,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3842026.6666666665, ans=10.0 2023-11-27 09:52:55,662 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11200, loss[loss=0.06137, simple_loss=0.08168, pruned_loss=0.01174, audio_tagging_loss=0.008782, over 16032.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08846, pruned_loss=0.01183, audio_tagging_loss=0.008958, over 3044238.16 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:52:55,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3842160.0, ans=0.125 2023-11-27 09:52:59,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3842160.0, ans=0.125 2023-11-27 09:53:22,311 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-27 09:53:23,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3842293.3333333335, ans=0.05 2023-11-27 09:53:30,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3842360.0, ans=0.0 2023-11-27 09:53:33,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.019e+01 9.427e+01 1.023e+02 1.335e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 09:53:50,655 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11250, loss[loss=0.068, simple_loss=0.1008, pruned_loss=0.01055, audio_tagging_loss=0.007071, over 16029.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08795, pruned_loss=0.01163, audio_tagging_loss=0.008905, over 3052108.92 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:53:58,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2023-11-27 09:54:01,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3842560.0, ans=0.125 2023-11-27 09:54:16,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-27 09:54:36,047 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:54:38,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-11-27 09:54:42,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3842760.0, ans=0.0 2023-11-27 09:54:42,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3842760.0, ans=0.0 2023-11-27 09:54:45,856 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11300, loss[loss=0.06386, simple_loss=0.09067, pruned_loss=0.009864, audio_tagging_loss=0.00866, over 14494.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08868, pruned_loss=0.01177, audio_tagging_loss=0.008734, over 3050857.91 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:54:46,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3842826.6666666665, ans=0.07 2023-11-27 09:54:50,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=15.0 2023-11-27 09:54:56,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3842893.3333333335, ans=0.0 2023-11-27 09:55:11,920 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-27 09:55:15,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3842960.0, ans=0.125 2023-11-27 09:55:20,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3843026.6666666665, ans=0.125 2023-11-27 09:55:25,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 9.044e+01 9.676e+01 1.062e+02 1.427e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 09:55:32,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3843093.3333333335, ans=0.125 2023-11-27 09:55:40,052 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11350, loss[loss=0.07008, simple_loss=0.0997, pruned_loss=0.009528, audio_tagging_loss=0.0107, over 16592.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08829, pruned_loss=0.0116, audio_tagging_loss=0.008625, over 3046888.48 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:55:42,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3843160.0, ans=0.0 2023-11-27 09:55:46,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-27 09:55:58,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3843226.6666666665, ans=0.125 2023-11-27 09:55:59,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3843226.6666666665, ans=0.1 2023-11-27 09:56:01,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3843226.6666666665, ans=0.2 2023-11-27 09:56:01,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3843226.6666666665, ans=0.125 2023-11-27 09:56:07,294 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-27 09:56:09,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3843293.3333333335, ans=15.0 2023-11-27 09:56:19,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-27 09:56:35,311 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11400, loss[loss=0.05691, simple_loss=0.07574, pruned_loss=0.008437, audio_tagging_loss=0.0106, over 15023.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08861, pruned_loss=0.01167, audio_tagging_loss=0.008598, over 3048915.55 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:56:54,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3843560.0, ans=0.0 2023-11-27 09:56:57,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3843626.6666666665, ans=0.0 2023-11-27 09:57:01,484 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-27 09:57:06,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3843626.6666666665, ans=0.125 2023-11-27 09:57:14,793 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:57:15,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 9.108e+01 9.687e+01 1.045e+02 1.288e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 09:57:30,708 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11450, loss[loss=0.05309, simple_loss=0.0611, pruned_loss=0.01068, audio_tagging_loss=0.01186, over 13854.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08843, pruned_loss=0.01174, audio_tagging_loss=0.008558, over 3044699.75 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:57:36,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3843826.6666666665, ans=0.07 2023-11-27 09:57:36,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3843826.6666666665, ans=0.2 2023-11-27 09:57:39,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=22.5 2023-11-27 09:57:45,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3843893.3333333335, ans=0.04949747468305833 2023-11-27 09:57:46,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3843893.3333333335, ans=0.125 2023-11-27 09:57:56,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-27 09:58:19,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3844093.3333333335, ans=0.1 2023-11-27 09:58:22,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3844093.3333333335, ans=0.0 2023-11-27 09:58:25,204 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11500, loss[loss=0.0625, simple_loss=0.08409, pruned_loss=0.0121, audio_tagging_loss=0.008359, over 15628.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08782, pruned_loss=0.01174, audio_tagging_loss=0.008603, over 3047062.11 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:58:30,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3844160.0, ans=0.125 2023-11-27 09:58:46,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3844293.3333333335, ans=0.125 2023-11-27 09:58:52,263 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-27 09:59:00,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3844360.0, ans=0.2 2023-11-27 09:59:02,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=12.0 2023-11-27 09:59:05,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.059e+01 9.598e+01 1.033e+02 1.422e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 09:59:19,835 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11550, loss[loss=0.08839, simple_loss=0.1166, pruned_loss=0.02034, audio_tagging_loss=0.009771, over 15827.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08821, pruned_loss=0.01182, audio_tagging_loss=0.008634, over 3042647.25 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:59:33,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3844560.0, ans=0.125 2023-11-27 09:59:33,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-11-27 09:59:35,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3844560.0, ans=0.125 2023-11-27 09:59:46,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-27 09:59:49,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-27 09:59:54,494 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:00:08,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3844760.0, ans=0.1 2023-11-27 10:00:15,299 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11600, loss[loss=0.06408, simple_loss=0.0884, pruned_loss=0.01175, audio_tagging_loss=0.008133, over 15411.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08777, pruned_loss=0.01165, audio_tagging_loss=0.008671, over 3044600.05 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:00:27,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-27 10:00:31,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3844893.3333333335, ans=0.125 2023-11-27 10:00:31,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3844893.3333333335, ans=0.125 2023-11-27 10:00:41,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-27 10:00:55,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 9.083e+01 9.816e+01 1.051e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 10:00:57,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3845026.6666666665, ans=0.5 2023-11-27 10:01:01,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3845093.3333333335, ans=0.125 2023-11-27 10:01:05,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-27 10:01:09,977 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11650, loss[loss=0.09691, simple_loss=0.1378, pruned_loss=0.02187, audio_tagging_loss=0.006131, over 16081.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08895, pruned_loss=0.01179, audio_tagging_loss=0.008518, over 3052171.28 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:01:14,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3845160.0, ans=0.1 2023-11-27 10:01:18,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-27 10:01:25,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3845226.6666666665, ans=0.0 2023-11-27 10:01:32,640 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:01:35,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3845293.3333333335, ans=0.125 2023-11-27 10:01:36,734 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-27 10:01:36,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3845293.3333333335, ans=0.1 2023-11-27 10:01:50,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3845360.0, ans=0.0 2023-11-27 10:01:52,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3845360.0, ans=0.05 2023-11-27 10:02:05,311 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11700, loss[loss=0.05517, simple_loss=0.07572, pruned_loss=0.007861, audio_tagging_loss=0.009444, over 14909.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08852, pruned_loss=0.01171, audio_tagging_loss=0.008571, over 3047667.74 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:02:24,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3845560.0, ans=0.09899494936611666 2023-11-27 10:02:31,913 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-27 10:02:45,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 8.983e+01 9.560e+01 1.031e+02 1.339e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 10:02:50,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3845760.0, ans=0.125 2023-11-27 10:03:00,659 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11750, loss[loss=0.06626, simple_loss=0.08825, pruned_loss=0.01425, audio_tagging_loss=0.007889, over 15370.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08877, pruned_loss=0.01177, audio_tagging_loss=0.008515, over 3048212.40 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:26,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3845960.0, ans=0.2 2023-11-27 10:03:26,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-27 10:03:26,919 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-27 10:03:27,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:03:43,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3846093.3333333335, ans=0.0 2023-11-27 10:03:53,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3846093.3333333335, ans=0.125 2023-11-27 10:03:55,768 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11800, loss[loss=0.04533, simple_loss=0.05406, pruned_loss=0.00638, audio_tagging_loss=0.01192, over 14638.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08869, pruned_loss=0.01185, audio_tagging_loss=0.008547, over 3044182.95 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:58,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-27 10:04:09,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3846226.6666666665, ans=15.0 2023-11-27 10:04:10,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2023-11-27 10:04:21,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2023-11-27 10:04:22,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-27 10:04:23,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3846293.3333333335, ans=0.125 2023-11-27 10:04:28,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3846360.0, ans=0.0 2023-11-27 10:04:32,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3846360.0, ans=0.0 2023-11-27 10:04:36,323 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.212e+01 9.788e+01 1.058e+02 1.368e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 10:04:46,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3846426.6666666665, ans=0.0 2023-11-27 10:04:50,444 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11850, loss[loss=0.05122, simple_loss=0.06249, pruned_loss=0.01018, audio_tagging_loss=0.009791, over 14987.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08861, pruned_loss=0.01188, audio_tagging_loss=0.008753, over 3041732.05 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:04:59,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3846493.3333333335, ans=0.1 2023-11-27 10:05:04,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3846560.0, ans=0.125 2023-11-27 10:05:07,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3846560.0, ans=0.0 2023-11-27 10:05:17,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-27 10:05:19,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3846626.6666666665, ans=0.0 2023-11-27 10:05:46,121 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11900, loss[loss=0.08429, simple_loss=0.1094, pruned_loss=0.01886, audio_tagging_loss=0.01072, over 15111.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08899, pruned_loss=0.01191, audio_tagging_loss=0.008798, over 3043290.73 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:06:02,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3846893.3333333335, ans=0.0 2023-11-27 10:06:08,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3846960.0, ans=0.0 2023-11-27 10:06:12,353 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-27 10:06:15,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3846960.0, ans=0.125 2023-11-27 10:06:21,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-27 10:06:26,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.851e+01 8.935e+01 9.684e+01 1.048e+02 1.256e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 10:06:29,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3847093.3333333335, ans=0.0 2023-11-27 10:06:38,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3847093.3333333335, ans=0.125 2023-11-27 10:06:40,518 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11950, loss[loss=0.06195, simple_loss=0.07333, pruned_loss=0.01291, audio_tagging_loss=0.01238, over 16411.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08802, pruned_loss=0.01181, audio_tagging_loss=0.008898, over 3044752.03 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:06:57,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3847226.6666666665, ans=0.125 2023-11-27 10:07:05,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-11-27 10:07:07,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-27 10:07:08,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3847293.3333333335, ans=0.0 2023-11-27 10:07:08,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3847293.3333333335, ans=0.1 2023-11-27 10:07:12,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3847293.3333333335, ans=0.0 2023-11-27 10:07:33,997 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 12000, loss[loss=0.08479, simple_loss=0.1172, pruned_loss=0.01773, audio_tagging_loss=0.008468, over 15152.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08805, pruned_loss=0.01172, audio_tagging_loss=0.008919, over 3051392.48 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 10:07:33,998 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 10:08:06,003 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05797, simple_loss=0.05046, pruned_loss=0.005369, audio_tagging_loss=0.02737, over 4681554.00 frames. 2023-11-27 10:08:06,003 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 10:08:21,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-27 10:08:23,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3847560.0, ans=0.0 2023-11-27 10:08:24,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3847560.0, ans=0.09899494936611666 2023-11-27 10:08:58,287 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 0, loss[loss=0.08561, simple_loss=0.1026, pruned_loss=0.01629, audio_tagging_loss=0.018, over 16024.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.1026, pruned_loss=0.01629, audio_tagging_loss=0.018, over 16024.00 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:08:58,288 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 10:09:12,423 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8428, 5.8654, 5.8975, 5.9182], device='cuda:1') 2023-11-27 10:09:21,834 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3434, 4.3153, 4.4746, 4.4700], device='cuda:1') 2023-11-27 10:09:25,570 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8408, 4.9518, 5.1516, 4.9171], device='cuda:1') 2023-11-27 10:09:29,259 INFO [train_asr.py:1267] (1/4) Epoch 49, validation: loss=0.05781, simple_loss=0.05038, pruned_loss=0.005301, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-27 10:09:29,260 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 10:09:29,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-27 10:09:29,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3847653.3333333335, ans=0.09899494936611666 2023-11-27 10:09:30,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3847653.3333333335, ans=0.2 2023-11-27 10:09:42,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3847720.0, ans=0.0 2023-11-27 10:09:42,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.973e+01 9.407e+01 1.008e+02 1.108e+02 1.423e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-27 10:10:07,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3847853.3333333335, ans=0.125 2023-11-27 10:10:23,729 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 50, loss[loss=0.07292, simple_loss=0.09443, pruned_loss=0.01243, audio_tagging_loss=0.01328, over 14740.00 frames. ], tot_loss[loss=0.07328, simple_loss=0.08981, pruned_loss=0.01186, audio_tagging_loss=0.01651, over 693630.54 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:10:23,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-27 10:10:30,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2023-11-27 10:11:07,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3848253.3333333335, ans=0.02 2023-11-27 10:11:07,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3848253.3333333335, ans=0.1 2023-11-27 10:11:11,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3848253.3333333335, ans=0.1 2023-11-27 10:11:19,480 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 100, loss[loss=0.07737, simple_loss=0.1124, pruned_loss=0.0111, audio_tagging_loss=0.01008, over 14426.00 frames. ], tot_loss[loss=0.07267, simple_loss=0.09, pruned_loss=0.01192, audio_tagging_loss=0.01575, over 1216445.89 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:11:19,538 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-27 10:11:34,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 9.835e+01 1.039e+02 1.086e+02 1.551e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-27 10:11:36,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3848386.6666666665, ans=0.125 2023-11-27 10:11:37,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-27 10:11:45,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3848453.3333333335, ans=0.015 2023-11-27 10:12:01,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3848520.0, ans=0.04949747468305833 2023-11-27 10:12:14,675 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 150, loss[loss=0.06927, simple_loss=0.09608, pruned_loss=0.01286, audio_tagging_loss=0.008367, over 15188.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.09124, pruned_loss=0.01195, audio_tagging_loss=0.01416, over 1628608.27 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:12:14,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-27 10:12:21,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3848653.3333333335, ans=0.0 2023-11-27 10:12:51,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3848853.3333333335, ans=0.125 2023-11-27 10:12:57,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3848920.0, ans=0.125 2023-11-27 10:13:09,318 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 200, loss[loss=0.08053, simple_loss=0.1083, pruned_loss=0.01576, audio_tagging_loss=0.0106, over 15031.00 frames. ], tot_loss[loss=0.07008, simple_loss=0.09119, pruned_loss=0.01186, audio_tagging_loss=0.01263, over 1935665.94 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:13:09,386 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-27 10:13:19,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3849053.3333333335, ans=0.0 2023-11-27 10:13:22,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849053.3333333335, ans=0.1 2023-11-27 10:13:25,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.137e+01 9.838e+01 1.045e+02 1.312e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 10:13:39,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3849120.0, ans=0.125 2023-11-27 10:13:45,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3849186.6666666665, ans=0.125 2023-11-27 10:13:57,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3849253.3333333335, ans=0.1 2023-11-27 10:13:57,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-27 10:14:04,763 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 250, loss[loss=0.08117, simple_loss=0.1131, pruned_loss=0.0176, audio_tagging_loss=0.007028, over 14506.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.09178, pruned_loss=0.0118, audio_tagging_loss=0.01134, over 2180066.40 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:14:04,826 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-27 10:15:00,637 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 300, loss[loss=0.07033, simple_loss=0.09306, pruned_loss=0.0129, audio_tagging_loss=0.0109, over 15505.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09174, pruned_loss=0.01173, audio_tagging_loss=0.01045, over 2371422.02 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:00,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-27 10:15:15,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 9.252e+01 9.785e+01 1.052e+02 1.385e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 10:15:28,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3849786.6666666665, ans=0.0 2023-11-27 10:15:55,341 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 350, loss[loss=0.06356, simple_loss=0.0924, pruned_loss=0.009971, audio_tagging_loss=0.007387, over 15007.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08986, pruned_loss=0.01159, audio_tagging_loss=0.01012, over 2520320.24 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:55,402 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-27 10:16:00,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-11-27 10:16:50,559 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 400, loss[loss=0.09429, simple_loss=0.1368, pruned_loss=0.01923, audio_tagging_loss=0.006653, over 16322.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08933, pruned_loss=0.01149, audio_tagging_loss=0.009678, over 2634650.44 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:16:50,622 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-27 10:16:51,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3850320.0, ans=0.125 2023-11-27 10:16:52,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3850320.0, ans=0.0 2023-11-27 10:17:07,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.990e+01 9.603e+01 1.040e+02 1.304e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 10:17:07,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3850386.6666666665, ans=0.125 2023-11-27 10:17:20,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3850453.3333333335, ans=0.125 2023-11-27 10:17:32,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3850520.0, ans=0.125 2023-11-27 10:17:43,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3850586.6666666665, ans=0.1 2023-11-27 10:17:46,059 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 450, loss[loss=0.05468, simple_loss=0.07585, pruned_loss=0.01007, audio_tagging_loss=0.006679, over 14096.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08986, pruned_loss=0.01174, audio_tagging_loss=0.009321, over 2727737.64 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:17:46,127 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-27 10:17:46,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3850653.3333333335, ans=0.2 2023-11-27 10:17:49,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3850653.3333333335, ans=0.125 2023-11-27 10:17:52,870 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:18:08,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3850786.6666666665, ans=0.125 2023-11-27 10:18:10,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2023-11-27 10:18:18,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3850853.3333333335, ans=0.0 2023-11-27 10:18:37,580 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:18:38,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3850920.0, ans=0.1 2023-11-27 10:18:39,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3850986.6666666665, ans=0.0 2023-11-27 10:18:40,539 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 500, loss[loss=0.0797, simple_loss=0.1152, pruned_loss=0.01614, audio_tagging_loss=0.005958, over 14948.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08925, pruned_loss=0.01173, audio_tagging_loss=0.009183, over 2795082.99 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:18:40,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-27 10:18:42,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3850986.6666666665, ans=0.2 2023-11-27 10:18:50,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3851053.3333333335, ans=0.0 2023-11-27 10:18:54,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3851053.3333333335, ans=0.0 2023-11-27 10:18:56,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 9.066e+01 9.728e+01 1.042e+02 1.279e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 10:18:56,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3851053.3333333335, ans=0.025 2023-11-27 10:19:08,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3851120.0, ans=0.2 2023-11-27 10:19:13,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3851186.6666666665, ans=0.125 2023-11-27 10:19:29,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3851253.3333333335, ans=0.125 2023-11-27 10:19:34,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3851320.0, ans=0.1 2023-11-27 10:19:35,130 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 550, loss[loss=0.06653, simple_loss=0.08726, pruned_loss=0.0117, audio_tagging_loss=0.0112, over 15085.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08934, pruned_loss=0.01178, audio_tagging_loss=0.009041, over 2847604.99 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:19:35,192 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-27 10:19:39,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3851320.0, ans=0.025 2023-11-27 10:19:52,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2023-11-27 10:20:11,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3851520.0, ans=0.2 2023-11-27 10:20:14,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3851520.0, ans=0.0 2023-11-27 10:20:16,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3851520.0, ans=0.1 2023-11-27 10:20:28,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3851586.6666666665, ans=0.125 2023-11-27 10:20:30,423 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 600, loss[loss=0.05956, simple_loss=0.07791, pruned_loss=0.01249, audio_tagging_loss=0.008117, over 14568.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08846, pruned_loss=0.01174, audio_tagging_loss=0.008935, over 2886866.53 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:20:30,487 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-27 10:20:34,806 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:20:39,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3851653.3333333335, ans=0.125 2023-11-27 10:20:40,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3851653.3333333335, ans=0.125 2023-11-27 10:20:43,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:44,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3851720.0, ans=0.0 2023-11-27 10:20:45,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3851720.0, ans=0.2 2023-11-27 10:20:47,118 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.017e+01 9.537e+01 1.031e+02 1.710e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 10:21:00,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3851786.6666666665, ans=0.125 2023-11-27 10:21:19,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=22.5 2023-11-27 10:21:23,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3851920.0, ans=0.125 2023-11-27 10:21:25,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3851986.6666666665, ans=0.09899494936611666 2023-11-27 10:21:25,965 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 650, loss[loss=0.07263, simple_loss=0.1045, pruned_loss=0.01335, audio_tagging_loss=0.007033, over 16373.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08866, pruned_loss=0.01186, audio_tagging_loss=0.008885, over 2923971.35 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:21:26,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-27 10:21:35,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3851986.6666666665, ans=0.0 2023-11-27 10:21:44,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3852053.3333333335, ans=0.125 2023-11-27 10:21:44,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3852053.3333333335, ans=0.125 2023-11-27 10:21:55,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2023-11-27 10:21:58,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3852186.6666666665, ans=0.05 2023-11-27 10:22:00,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3852186.6666666665, ans=0.015 2023-11-27 10:22:08,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3852186.6666666665, ans=0.09899494936611666 2023-11-27 10:22:17,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3852253.3333333335, ans=0.125 2023-11-27 10:22:20,672 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 700, loss[loss=0.07025, simple_loss=0.08805, pruned_loss=0.017, audio_tagging_loss=0.009226, over 15420.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09061, pruned_loss=0.01225, audio_tagging_loss=0.008761, over 2961513.14 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:22:20,743 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-27 10:22:21,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3852320.0, ans=0.125 2023-11-27 10:22:37,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 9.117e+01 9.739e+01 1.041e+02 1.243e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 10:22:49,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3852453.3333333335, ans=0.09899494936611666 2023-11-27 10:22:56,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3852520.0, ans=0.0 2023-11-27 10:23:16,083 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 750, loss[loss=0.06142, simple_loss=0.08544, pruned_loss=0.009851, audio_tagging_loss=0.008848, over 15354.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09062, pruned_loss=0.01223, audio_tagging_loss=0.008776, over 2979838.95 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:23:16,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-27 10:23:16,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3852653.3333333335, ans=0.125 2023-11-27 10:23:24,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3852653.3333333335, ans=0.125 2023-11-27 10:23:24,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=12.0 2023-11-27 10:23:36,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2023-11-27 10:23:40,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=12.0 2023-11-27 10:24:03,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-27 10:24:11,162 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 800, loss[loss=0.048, simple_loss=0.05794, pruned_loss=0.007799, audio_tagging_loss=0.01123, over 15769.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08994, pruned_loss=0.01222, audio_tagging_loss=0.008849, over 3001575.54 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:24:11,230 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-27 10:24:12,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2023-11-27 10:24:26,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.085e+01 9.807e+01 1.032e+02 1.313e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 10:24:27,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.88 vs. limit=10.0 2023-11-27 10:24:30,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3853053.3333333335, ans=0.0 2023-11-27 10:24:30,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3853053.3333333335, ans=0.09899494936611666 2023-11-27 10:24:30,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3853053.3333333335, ans=0.09899494936611666 2023-11-27 10:24:32,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3853120.0, ans=0.2 2023-11-27 10:24:41,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3853120.0, ans=0.125 2023-11-27 10:24:45,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2023-11-27 10:25:05,440 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 850, loss[loss=0.0643, simple_loss=0.08954, pruned_loss=0.01404, audio_tagging_loss=0.00549, over 14505.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08986, pruned_loss=0.0121, audio_tagging_loss=0.008856, over 3015054.43 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:25:05,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-27 10:25:10,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3853320.0, ans=0.0 2023-11-27 10:25:15,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3853386.6666666665, ans=0.1 2023-11-27 10:25:19,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3853386.6666666665, ans=0.125 2023-11-27 10:25:20,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3853386.6666666665, ans=0.0 2023-11-27 10:25:22,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3853386.6666666665, ans=0.0 2023-11-27 10:25:36,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3853453.3333333335, ans=0.125 2023-11-27 10:25:46,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2023-11-27 10:25:49,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-27 10:25:53,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3853586.6666666665, ans=0.125 2023-11-27 10:26:01,083 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 900, loss[loss=0.04983, simple_loss=0.05791, pruned_loss=0.009491, audio_tagging_loss=0.01139, over 15602.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08936, pruned_loss=0.01199, audio_tagging_loss=0.008877, over 3016976.23 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:01,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-27 10:26:04,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3853653.3333333335, ans=0.05 2023-11-27 10:26:09,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3853653.3333333335, ans=0.2 2023-11-27 10:26:14,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3853720.0, ans=0.0 2023-11-27 10:26:19,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 9.242e+01 9.846e+01 1.086e+02 1.686e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:26:20,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3853720.0, ans=0.125 2023-11-27 10:26:47,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3853920.0, ans=0.1 2023-11-27 10:26:47,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3853920.0, ans=0.0 2023-11-27 10:26:56,615 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 950, loss[loss=0.0777, simple_loss=0.1073, pruned_loss=0.01583, audio_tagging_loss=0.008197, over 15211.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09013, pruned_loss=0.01197, audio_tagging_loss=0.008735, over 3028832.72 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:56,674 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-27 10:27:13,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-27 10:27:17,425 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.095e-03 2023-11-27 10:27:19,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3854120.0, ans=0.125 2023-11-27 10:27:21,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3854120.0, ans=0.125 2023-11-27 10:27:26,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3854120.0, ans=0.125 2023-11-27 10:27:51,636 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1000, loss[loss=0.05841, simple_loss=0.07521, pruned_loss=0.01131, audio_tagging_loss=0.009491, over 14907.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08984, pruned_loss=0.01194, audio_tagging_loss=0.00865, over 3028118.11 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:27:51,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-27 10:28:09,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.145e+01 9.757e+01 1.033e+02 1.378e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 10:28:13,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3854453.3333333335, ans=0.125 2023-11-27 10:28:15,063 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:28:18,390 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:28:40,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3854586.6666666665, ans=0.1 2023-11-27 10:28:40,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3854586.6666666665, ans=0.125 2023-11-27 10:28:46,181 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1050, loss[loss=0.06071, simple_loss=0.07988, pruned_loss=0.01456, audio_tagging_loss=0.006211, over 15479.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08944, pruned_loss=0.01181, audio_tagging_loss=0.008556, over 3032086.16 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:28:46,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-27 10:28:54,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3854653.3333333335, ans=0.2 2023-11-27 10:29:02,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3854720.0, ans=0.125 2023-11-27 10:29:04,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3854720.0, ans=0.2 2023-11-27 10:29:04,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-27 10:29:09,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3854786.6666666665, ans=0.0 2023-11-27 10:29:31,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3854920.0, ans=0.125 2023-11-27 10:29:34,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=22.5 2023-11-27 10:29:41,662 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1100, loss[loss=0.06017, simple_loss=0.0694, pruned_loss=0.01441, audio_tagging_loss=0.01107, over 14143.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08895, pruned_loss=0.01178, audio_tagging_loss=0.008525, over 3036935.92 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:29:41,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-27 10:29:43,839 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:29:57,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3855053.3333333335, ans=0.0 2023-11-27 10:29:58,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 8.982e+01 9.681e+01 1.049e+02 1.414e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:30:20,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-27 10:30:29,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3855253.3333333335, ans=0.125 2023-11-27 10:30:33,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3855253.3333333335, ans=0.125 2023-11-27 10:30:36,811 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1150, loss[loss=0.05808, simple_loss=0.08093, pruned_loss=0.009223, audio_tagging_loss=0.008394, over 15805.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08899, pruned_loss=0.0118, audio_tagging_loss=0.00851, over 3036170.18 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:30:36,875 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-27 10:31:16,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3855520.0, ans=0.125 2023-11-27 10:31:17,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3855520.0, ans=0.0 2023-11-27 10:31:21,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-27 10:31:21,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-27 10:31:22,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.37 vs. limit=6.0 2023-11-27 10:31:26,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2023-11-27 10:31:31,595 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1200, loss[loss=0.05693, simple_loss=0.07472, pruned_loss=0.01163, audio_tagging_loss=0.007947, over 15682.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08806, pruned_loss=0.01165, audio_tagging_loss=0.008512, over 3030855.96 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:31:31,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-27 10:31:34,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3855653.3333333335, ans=0.125 2023-11-27 10:31:40,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-27 10:31:46,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3855720.0, ans=0.05 2023-11-27 10:31:46,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3855720.0, ans=0.0 2023-11-27 10:31:49,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.092e+01 9.675e+01 1.031e+02 1.166e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 10:31:59,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3855786.6666666665, ans=0.125 2023-11-27 10:32:20,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3855920.0, ans=0.125 2023-11-27 10:32:22,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3855920.0, ans=0.125 2023-11-27 10:32:27,225 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1250, loss[loss=0.06481, simple_loss=0.09273, pruned_loss=0.008058, audio_tagging_loss=0.01038, over 14682.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08806, pruned_loss=0.01176, audio_tagging_loss=0.008498, over 3023912.57 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:32:27,284 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-27 10:32:32,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3855986.6666666665, ans=0.0 2023-11-27 10:33:03,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3856186.6666666665, ans=0.125 2023-11-27 10:33:05,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3856186.6666666665, ans=0.2 2023-11-27 10:33:12,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3856253.3333333335, ans=10.0 2023-11-27 10:33:21,806 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1300, loss[loss=0.06319, simple_loss=0.08896, pruned_loss=0.01191, audio_tagging_loss=0.006799, over 14397.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08815, pruned_loss=0.01185, audio_tagging_loss=0.008485, over 3027488.72 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:33:21,870 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-27 10:33:35,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3856386.6666666665, ans=0.0 2023-11-27 10:33:39,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 9.062e+01 9.714e+01 1.030e+02 1.237e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:33:44,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3856453.3333333335, ans=0.2 2023-11-27 10:33:53,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2023-11-27 10:34:11,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2023-11-27 10:34:17,146 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1350, loss[loss=0.08456, simple_loss=0.1165, pruned_loss=0.0157, audio_tagging_loss=0.01059, over 15214.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08831, pruned_loss=0.01181, audio_tagging_loss=0.008523, over 3031642.96 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:34:17,210 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-27 10:34:37,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3856720.0, ans=0.125 2023-11-27 10:34:45,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3856786.6666666665, ans=0.2 2023-11-27 10:34:55,366 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:35:12,562 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1400, loss[loss=0.08554, simple_loss=0.1248, pruned_loss=0.01855, audio_tagging_loss=0.004567, over 15591.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08782, pruned_loss=0.01178, audio_tagging_loss=0.008603, over 3041328.79 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:35:12,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-27 10:35:18,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3856986.6666666665, ans=0.2 2023-11-27 10:35:23,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3857053.3333333335, ans=0.0 2023-11-27 10:35:28,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3857053.3333333335, ans=0.125 2023-11-27 10:35:30,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 9.274e+01 9.843e+01 1.071e+02 1.343e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:35:33,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3857120.0, ans=0.125 2023-11-27 10:35:44,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-27 10:35:46,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3857186.6666666665, ans=0.0 2023-11-27 10:35:48,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3857186.6666666665, ans=0.0 2023-11-27 10:36:07,248 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1450, loss[loss=0.08442, simple_loss=0.1256, pruned_loss=0.0142, audio_tagging_loss=0.00744, over 15314.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08774, pruned_loss=0.01177, audio_tagging_loss=0.008661, over 3037299.33 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:36:07,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-27 10:36:29,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3857453.3333333335, ans=0.125 2023-11-27 10:36:34,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3857453.3333333335, ans=0.1 2023-11-27 10:36:50,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3857586.6666666665, ans=0.125 2023-11-27 10:36:52,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3857586.6666666665, ans=0.1 2023-11-27 10:36:58,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-27 10:37:02,053 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1500, loss[loss=0.06243, simple_loss=0.07866, pruned_loss=0.01189, audio_tagging_loss=0.01121, over 15222.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08907, pruned_loss=0.01193, audio_tagging_loss=0.008724, over 3035656.80 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:02,125 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-27 10:37:07,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3857653.3333333335, ans=0.125 2023-11-27 10:37:11,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.43 vs. limit=22.5 2023-11-27 10:37:16,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3857720.0, ans=0.1 2023-11-27 10:37:21,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.188e+01 9.715e+01 1.038e+02 1.214e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:37:23,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3857786.6666666665, ans=0.2 2023-11-27 10:37:30,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2023-11-27 10:37:43,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3857853.3333333335, ans=0.125 2023-11-27 10:37:57,654 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1550, loss[loss=0.04125, simple_loss=0.03512, pruned_loss=0.006218, audio_tagging_loss=0.01747, over 13946.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08962, pruned_loss=0.0121, audio_tagging_loss=0.008814, over 3034631.16 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:57,715 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-27 10:38:06,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3857986.6666666665, ans=0.1 2023-11-27 10:38:07,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3858053.3333333335, ans=0.95 2023-11-27 10:38:16,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3858053.3333333335, ans=0.0 2023-11-27 10:38:22,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3858120.0, ans=0.0 2023-11-27 10:38:47,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2023-11-27 10:38:52,548 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1600, loss[loss=0.06027, simple_loss=0.07961, pruned_loss=0.01117, audio_tagging_loss=0.009295, over 15474.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08894, pruned_loss=0.0121, audio_tagging_loss=0.008879, over 3038944.34 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:38:52,611 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-27 10:38:59,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2023-11-27 10:39:04,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3858386.6666666665, ans=0.1 2023-11-27 10:39:04,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-27 10:39:07,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3858386.6666666665, ans=0.125 2023-11-27 10:39:10,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.050e+01 9.679e+01 1.052e+02 1.346e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:39:34,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3858520.0, ans=0.0 2023-11-27 10:39:35,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3858586.6666666665, ans=0.0 2023-11-27 10:39:43,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2023-11-27 10:39:46,743 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1650, loss[loss=0.0702, simple_loss=0.09953, pruned_loss=0.01256, audio_tagging_loss=0.00787, over 15278.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08973, pruned_loss=0.01226, audio_tagging_loss=0.008876, over 3047786.98 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:39:46,804 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-27 10:40:07,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-27 10:40:43,184 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1700, loss[loss=0.06807, simple_loss=0.09781, pruned_loss=0.008921, audio_tagging_loss=0.01024, over 14860.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08893, pruned_loss=0.01202, audio_tagging_loss=0.008884, over 3042533.77 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:40:43,248 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-27 10:40:43,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3858986.6666666665, ans=0.0 2023-11-27 10:40:44,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3858986.6666666665, ans=0.0 2023-11-27 10:40:47,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3858986.6666666665, ans=0.5 2023-11-27 10:40:51,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2023-11-27 10:40:53,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3859053.3333333335, ans=0.125 2023-11-27 10:41:00,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3859053.3333333335, ans=0.0 2023-11-27 10:41:02,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.167e+01 9.822e+01 1.054e+02 1.344e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-27 10:41:38,515 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1750, loss[loss=0.06612, simple_loss=0.0935, pruned_loss=0.01106, audio_tagging_loss=0.008306, over 15092.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08874, pruned_loss=0.01203, audio_tagging_loss=0.00884, over 3046811.47 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:41:38,584 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-27 10:41:56,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3859386.6666666665, ans=0.125 2023-11-27 10:42:27,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-27 10:42:31,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.38 vs. limit=22.5 2023-11-27 10:42:32,883 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1800, loss[loss=0.04972, simple_loss=0.06815, pruned_loss=0.006854, audio_tagging_loss=0.008794, over 15561.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08965, pruned_loss=0.01203, audio_tagging_loss=0.008663, over 3054736.15 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:42:32,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-27 10:42:50,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3859720.0, ans=0.125 2023-11-27 10:42:53,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.995e+01 9.639e+01 1.040e+02 1.222e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 10:43:00,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3859786.6666666665, ans=0.125 2023-11-27 10:43:04,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3859786.6666666665, ans=0.125 2023-11-27 10:43:04,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3859786.6666666665, ans=0.125 2023-11-27 10:43:19,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3859920.0, ans=0.125 2023-11-27 10:43:20,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3859920.0, ans=0.125 2023-11-27 10:43:28,238 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1850, loss[loss=0.09149, simple_loss=0.1297, pruned_loss=0.01996, audio_tagging_loss=0.006711, over 15658.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08963, pruned_loss=0.0121, audio_tagging_loss=0.008629, over 3052466.84 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:43:28,315 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-27 10:43:28,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3859986.6666666665, ans=0.125 2023-11-27 10:43:46,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3860053.3333333335, ans=0.035 2023-11-27 10:43:59,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2023-11-27 10:44:02,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3860186.6666666665, ans=0.1 2023-11-27 10:44:13,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3860253.3333333335, ans=0.125 2023-11-27 10:44:23,739 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1900, loss[loss=0.07353, simple_loss=0.1033, pruned_loss=0.01536, audio_tagging_loss=0.006537, over 15857.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08914, pruned_loss=0.012, audio_tagging_loss=0.008638, over 3050762.72 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:44:23,813 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-27 10:44:28,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3860320.0, ans=0.0 2023-11-27 10:44:31,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3860320.0, ans=0.125 2023-11-27 10:44:35,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3860386.6666666665, ans=0.1 2023-11-27 10:44:36,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3860386.6666666665, ans=0.125 2023-11-27 10:44:39,997 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:44:44,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 9.131e+01 9.734e+01 1.046e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 10:45:05,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=22.5 2023-11-27 10:45:08,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3860586.6666666665, ans=0.025 2023-11-27 10:45:08,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2023-11-27 10:45:12,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=22.5 2023-11-27 10:45:13,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3860586.6666666665, ans=0.125 2023-11-27 10:45:18,684 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1950, loss[loss=0.07862, simple_loss=0.1039, pruned_loss=0.01561, audio_tagging_loss=0.01109, over 15453.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08879, pruned_loss=0.01197, audio_tagging_loss=0.008579, over 3049634.00 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:45:18,749 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-27 10:45:33,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3860720.0, ans=0.125 2023-11-27 10:45:33,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-11-27 10:45:47,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-27 10:45:52,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3860853.3333333335, ans=10.0 2023-11-27 10:46:10,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3860920.0, ans=0.125 2023-11-27 10:46:13,738 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2000, loss[loss=0.07811, simple_loss=0.1134, pruned_loss=0.0139, audio_tagging_loss=0.007508, over 13472.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08879, pruned_loss=0.012, audio_tagging_loss=0.008525, over 3043270.92 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:46:13,801 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-27 10:46:24,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3861053.3333333335, ans=0.0 2023-11-27 10:46:35,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.839e+01 9.475e+01 1.022e+02 1.680e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 10:46:48,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3861186.6666666665, ans=0.0 2023-11-27 10:46:49,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3861186.6666666665, ans=0.0 2023-11-27 10:47:02,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-27 10:47:10,266 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2050, loss[loss=0.0738, simple_loss=0.1001, pruned_loss=0.01545, audio_tagging_loss=0.008309, over 16215.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08895, pruned_loss=0.01203, audio_tagging_loss=0.008546, over 3044577.78 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:47:10,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-27 10:47:10,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=12.0 2023-11-27 10:47:27,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3861386.6666666665, ans=0.07 2023-11-27 10:47:35,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3861453.3333333335, ans=0.125 2023-11-27 10:47:44,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2023-11-27 10:47:47,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3861520.0, ans=0.125 2023-11-27 10:47:51,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3861520.0, ans=0.1 2023-11-27 10:48:07,619 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2100, loss[loss=0.06463, simple_loss=0.09589, pruned_loss=0.009536, audio_tagging_loss=0.007148, over 14747.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08934, pruned_loss=0.01206, audio_tagging_loss=0.008461, over 3037862.16 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:48:07,690 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-27 10:48:23,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3861720.0, ans=0.125 2023-11-27 10:48:27,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3861720.0, ans=0.125 2023-11-27 10:48:28,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.950e+01 9.629e+01 1.055e+02 1.441e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 10:48:30,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3861786.6666666665, ans=0.0 2023-11-27 10:48:33,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3861786.6666666665, ans=0.1 2023-11-27 10:48:48,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-27 10:48:57,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3861920.0, ans=0.125 2023-11-27 10:49:03,228 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2150, loss[loss=0.06127, simple_loss=0.08093, pruned_loss=0.01096, audio_tagging_loss=0.009842, over 16490.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08904, pruned_loss=0.01201, audio_tagging_loss=0.008559, over 3037223.95 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:49:03,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-27 10:49:03,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2023-11-27 10:49:11,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=15.0 2023-11-27 10:49:16,127 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:49:35,560 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:49:36,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-27 10:49:43,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3862186.6666666665, ans=0.125 2023-11-27 10:49:59,855 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2200, loss[loss=0.06315, simple_loss=0.09148, pruned_loss=0.009861, audio_tagging_loss=0.007549, over 16180.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08979, pruned_loss=0.01207, audio_tagging_loss=0.008534, over 3042527.31 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:49:59,978 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579350 2023-11-27 10:50:00,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3862320.0, ans=0.125